With Gael Martin, Brendan McCabe, David T. Frazier, and Worapree Maneesoonthorn, we arXived (and submitted) a strongly revised version of our earlier paper. We begin by demonstrating that reduction to a set of sufficient statistics of reduced dimension relative to the sample size is infeasible for most state-space models, hence calling for the use of partial posteriors in such settings. Then we give conditions [like parameter identification] under which ABC methods are Bayesian consistent, when using an auxiliary model to produce summaries, either as MLEs or [more efficiently] scores. Indeed, for the order of accuracy required by the ABC perspective, scores are equivalent to MLEs but are computed much faster than MLEs. Those conditions happen to to be weaker than those found in the recent papers of Li and Fearnhead (2016) and Creel et al. (2015). In particular as we make no assumption about the limiting distributions of the summary statistics. We also tackle the dimensionality curse that plagues ABC techniques by numerically exhibiting the improved accuracy brought by looking at marginal rather than joint modes. That is, by matching individual parameters via the corresponding scalar score of the integrated auxiliary likelihood rather than matching on the multi-dimensional score statistics. The approach is illustrated on realistically complex models, namely a (latent) Ornstein-Ulenbeck process with a discrete time linear Gaussian approximation is adopted and a Kalman filter auxiliary likelihood. And a square root volatility process with an auxiliary likelihood associated with a Euler discretisation and the augmented unscented Kalman filter. In our experiments, we compared our auxiliary based technique to the two-step approach of Fearnhead and Prangle (in the Read Paper of 2012), exhibiting improvement for the examples analysed therein. Somewhat predictably, an important challenge in this approach that is common with the related techniques of indirect inference and efficient methods of moments, is the choice of a computationally efficient and accurate auxiliary model. But most of the current ABC literature discusses the role and choice of the summary statistics, which amounts to the same challenge, while missing the regularity provided by score functions of our auxiliary models.
Archive for Melbourne
I just heard that Peter Hall passed away yesterday in Melbourne. Very sad news from down under. Besides being a giant in the fields of statistics and probability, with an astounding publication record, Peter was also a wonderful man and so very much involved in running local, national and international societies. His contributions to the field and the profession are innumerable and his loss impacts the entire community. Peter was a regular visitor at Glasgow University in the 1990s and I crossed paths with him a few times, appreciating his kindness as well as his highest dedication to research. In addition, he was a gifted photographer and I recall that the [now closed] wonderful guest-house where we used to stay at the top of Hillhead had a few pictures of his taken in the Highlands and framed on its walls. (If I remember well, there were also beautiful pictures of the Belgian countryside by him at CORE, in Louvain-la-Neuve.) I think the last time we met was in Melbourne, three years ago… Farewell, Peter, you certainly left an indelible print on a lot of us.
[Song Chen from Beijing University has created a memorial webpage for Peter Hall to express condolences and share memories.]
Along with David Frazier and Gael Martin from Monash University, Melbourne, we have just completed (and arXived) a paper on the (Bayesian) consistency of ABC methods, producing sufficient conditions on the summary statistics to ensure consistency of the ABC posterior. Consistency in the sense of the prior concentrating at the true value of the parameter when the sample size and the inverse tolerance (intolerance?!) go to infinity. The conditions are essentially that the summary statistics concentrates around its mean and that this mean identifies the parameter. They are thus weaker conditions than those found earlier consistency results where the authors considered convergence to the genuine posterior distribution (given the summary), as for instance in Biau et al. (2014) or Li and Fearnhead (2015). We do not require here a specific rate of decrease to zero for the tolerance ε. But still they do not hold all the time, as shown for the MA(2) example and its first two autocorrelation summaries, example we started using in the Marin et al. (2011) survey. We further propose a consistency assessment based on the main consistency theorem, namely that the ABC-based estimates of the marginal posterior densities for the parameters should vary little when adding extra components to the summary statistic, densities estimated from simulated data. And that the mean of the resulting summary statistic is indeed one-to-one. This may sound somewhat similar to the stepwise search algorithm of Joyce and Marjoram (2008), but those authors aim at obtaining a vector of summary statistics that is as informative as possible. We also examine the consistency conditions when using an auxiliary model as in indirect inference. For instance, when using an AR(2) auxiliary model for estimating an MA(2) model. And ODEs.
Last week, on arXiv, Espen Bernton, Shihao Yang, Yang Chen, Neil Shephard, and Jun Liu (all from Harvard) proposed a weighting scheme to associated MCMC simulations, in connection with the parallel MCMC of Ben Calderhead discussed earlier on the ‘Og. The weight attached to each proposal is either the acceptance probability itself (with the rejection probability being attached to the current value of the MCMC chain) or a renormalised version of the joint target x proposal, either forward or backward. Both solutions are unbiased in that they have the same expectation as the original MCMC average, being some sort of conditional expectation. The proof of domination in the paper builds upon Calderhead’s formalism.
This work reminded me of several reweighting proposals we made over the years, from the global Rao-Blackwellisation strategy with George Casella, to the vanilla Rao-Blackwellisation solution we wrote with Randal Douc a few years ago, both of whom also are demonstrably improving upon the standard MCMC average. By similarly recycling proposed but rejected values. Or by diminishing the variability due to the uniform draw. The slightly parallel nature of the approach also connects with our parallel MCM version with Pierre Jacob (now Harvard as well!) and Murray Smith (who now leaves in Melbourne, hence the otherwise unrelated picture).
While it took quite a while (!), with several visits by three of us to our respective antipodes, incl. my exciting trip to Melbourne and Monash University two years ago, our paper on ABC for state space models was arXived yesterday! Thanks to my coauthors, Gael Martin, Brendan McCabe, and Worapree Maneesoonthorn, I am very glad of this outcome and of the new perspective on ABC it produces. For one thing, it concentrates on the selection of summary statistics from a more econometrics than usual point of view, defining asymptotic sufficiency in this context and demonstrated that both asymptotic sufficiency and Bayes consistency can be achieved when using maximum likelihood estimators of the parameters of an auxiliary model as summary statistics. In addition, the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score vector. Using the score instead of the MLE as a summary statistics allows for huge gains in terms of speed. The method is then applied to a continuous time state space model, using as auxiliary model an augmented unscented Kalman filter. We also found in the various state space models tested therein that the ABC approach based on the marginal [likelihood] score was performing quite well, including wrt Fearnhead’s and Prangle’s (2012) approach… I like the idea of using such a generic object as the unscented Kalman filter for state space models, even when it is not a particularly accurate representation of the true model. Another appealing feature of the paper is in the connections made with indirect inference.
In 1993, we wrote a paper [with George Casella and Gene/Juinn Hwang] on the paradoxical consequences of using the loss function
(published in Statistica Sinica, 3, 141-155) since it led to the following property: for the standard normal mean estimation problem, the regular confidence interval is dominated by the modified confidence interval equal to the empty set when s² is too large… This was first pointed out by Jim Berger and the most natural culprit is the artificial loss function where the first part is unbounded while the second part is bounded by k. Recently, Paul Kabaila—whom I met in both Adelaide, where he quite appropriately commented about the abnormal talk at the conference!, and Melbourne, where we met with his students after my seminar at the University of Melbourne—published a paper (first on arXiv then in Statistics and Probability Letters) where he demonstrates that the mere modification of the above loss into
solves the paradox:! For Jeffreys’ non-informative prior, the Bayes (optimal) estimate is the regular confidence interval. besides doing the trick, this nice resolution explains the earlier paradox as being linked to a lack of invariance in the (earlier) loss function. This is somehow satisfactory since Jeffreys’ prior also is the invariant prior in this case.