Scandinavian picaresque, in the spirit of the novels of Paasilinna, and following another book by Jonas Jonasson already commented on the ‘Og, The Girl who saved the King of Sweden, but not as funny, because of the heavy recourse to World history, the main (100 year old) character meeting a large collection of major historical figures. And crossing the Himalayas when escaping from a Russian Gulag, which reminded me of this fantastic if possibly apocryphal The Long Walk where a group of Polish prisoners was making it through the Gobi desert to reach India and freedom (or death). The story here is funny but not that funny and once it is over, there is not much to say about it, which is why I left it on a bookshare table in Monash. The current events are somewhat dull, in opposition to the 100 year life of Allan, and the police enquiry a tad too predictable. Plus the themes are somewhat comparable to The Girl who …, with atom bombs, cold war, brothers hating one another…
Archive for Monash University
“This formulation reveals an interesting connection between multiple hypothesis testing and mixture modelling with the class labels corresponding to the accepted hypotheses in each test.”
After my seminar at Monash University last Friday, David Dowe pointed out to me the recent work by Enes Makalic and Daniel Schmidt on minimum description length (MDL) methods for multiple testing as somewhat related to our testing by mixture paper. Work which appeared in the proceedings of the 4th Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-11), that took place in Helsinki, Finland, in 2011. Minimal encoding length approaches lead to choosing the model that enjoys the smallest coding length. Connected with, e.g., Rissannen‘s approach. The extension in this paper consists in considering K hypotheses at once on a collection of m datasets (the multiple then bears on the datasets rather than on the hypotheses). And to associate an hypothesis index to each dataset. When the objective function is the sum of (generalised) penalised likelihoods [as in BIC], it leads to selecting the “minimal length” model for each dataset. But the authors introduce weights or probabilities for each of the K hypotheses, which indeed then amounts to a mixture-like representation on the exponentiated codelengths. Which estimation by optimal coding was first proposed by Chris Wallace in his book. This approach eliminates the model parameters at an earlier stage, e.g. by maximum likelihood estimation, to return a quantity that only depends on the model index and the data. In fine, the purpose of the method differs from ours in that the former aims at identifying an appropriate hypothesis for each group of observations, rather than ranking those hypotheses for the entire dataset by considering the posterior distribution of the weights in the later. The mixture has somehow more of a substance in the first case, where separating the datasets into groups is part of the inference.
Taking advantage of being in San Francisco, I flew yesterday to Australia over the Pacific, crossing for the first time the day line. The 15 hour Qantas flight to Sydney was remarkably smooth and quiet, with most passengers sleeping for most of the way, and it gave me a great opportunity to go over several papers I wanted to read and review. Over the next week or so, I will work with my friends and co-authors David Frazier and Gael Martin at Monash University (and undoubtedly enjoy the great food and wine scene!). Before flying back to Paris (alas via San Francisco rather than direct).
With David Frazier and Gael Martin from Monash University, and with Judith Rousseau (Paris-Dauphine), we have now completed and arXived a paper entitled Asymptotic Properties of Approximate Bayesian Computation. This paper undertakes a fairly complete study of the large sample properties of ABC under weak regularity conditions. We produce therein sufficient conditions for posterior concentration, asymptotic normality of the ABC posterior estimate, and asymptotic normality of the ABC posterior mean. Moreover, those (theoretical) results are of significant import for practitioners of ABC as they pertain to the choice of tolerance ε used within ABC for selecting parameter draws. In particular, they [the results] contradict the conventional ABC wisdom that this tolerance should always be taken as small as the computing budget allows.
Now, this paper bears some similarities with our earlier paper on the consistency of ABC, written with David and Gael. As it happens, the paper was rejected after submission and I then discussed it in an internal seminar in Paris-Dauphine, with Judith taking part in the discussion and quickly suggesting some alternative approach that is now central to the current paper. The previous version analysed Bayesian consistency of ABC under specific uniformity conditions on the summary statistics used within ABC. But conditions for consistency are now much weaker conditions than earlier, thanks to Judith’s input!
- Li and Fearnhead (2015) considers an ABC algorithm based on kernel smoothing, whereas our interest is the original ABC accept-reject and its many derivatives
- our theoretical approach permits a complete study of the asymptotic properties of ABC, posterior concentration, asymptotic normality of ABC posteriors, and asymptotic normality of the ABC posterior mean, whereas Li and Fearnhead (2015) is only concerned with asymptotic normality of the ABC posterior mean estimator (and various related point estimators);
- the results of Li and Fearnhead (2015) are derived under very strict uniformity and continuity/differentiability conditions, which bear a strong resemblance to those conditions in Yuan and Clark (2004) and Creel et al. (2015), while the result herein do not rely on such conditions and only assume very weak regularity conditions on the summaries statistics themselves; this difference allows us to characterise the behaviour of ABC in situations not covered by the approach taken in Li and Fearnhead (2015);
With Gael Martin, Brendan McCabe, David T. Frazier, and Worapree Maneesoonthorn, we arXived (and submitted) a strongly revised version of our earlier paper. We begin by demonstrating that reduction to a set of sufficient statistics of reduced dimension relative to the sample size is infeasible for most state-space models, hence calling for the use of partial posteriors in such settings. Then we give conditions [like parameter identification] under which ABC methods are Bayesian consistent, when using an auxiliary model to produce summaries, either as MLEs or [more efficiently] scores. Indeed, for the order of accuracy required by the ABC perspective, scores are equivalent to MLEs but are computed much faster than MLEs. Those conditions happen to to be weaker than those found in the recent papers of Li and Fearnhead (2016) and Creel et al. (2015). In particular as we make no assumption about the limiting distributions of the summary statistics. We also tackle the dimensionality curse that plagues ABC techniques by numerically exhibiting the improved accuracy brought by looking at marginal rather than joint modes. That is, by matching individual parameters via the corresponding scalar score of the integrated auxiliary likelihood rather than matching on the multi-dimensional score statistics. The approach is illustrated on realistically complex models, namely a (latent) Ornstein-Ulenbeck process with a discrete time linear Gaussian approximation is adopted and a Kalman filter auxiliary likelihood. And a square root volatility process with an auxiliary likelihood associated with a Euler discretisation and the augmented unscented Kalman filter. In our experiments, we compared our auxiliary based technique to the two-step approach of Fearnhead and Prangle (in the Read Paper of 2012), exhibiting improvement for the examples analysed therein. Somewhat predictably, an important challenge in this approach that is common with the related techniques of indirect inference and efficient methods of moments, is the choice of a computationally efficient and accurate auxiliary model. But most of the current ABC literature discusses the role and choice of the summary statistics, which amounts to the same challenge, while missing the regularity provided by score functions of our auxiliary models.
Along with David Frazier and Gael Martin from Monash University, Melbourne, we have just completed (and arXived) a paper on the (Bayesian) consistency of ABC methods, producing sufficient conditions on the summary statistics to ensure consistency of the ABC posterior. Consistency in the sense of the prior concentrating at the true value of the parameter when the sample size and the inverse tolerance (intolerance?!) go to infinity. The conditions are essentially that the summary statistics concentrates around its mean and that this mean identifies the parameter. They are thus weaker conditions than those found earlier consistency results where the authors considered convergence to the genuine posterior distribution (given the summary), as for instance in Biau et al. (2014) or Li and Fearnhead (2015). We do not require here a specific rate of decrease to zero for the tolerance ε. But still they do not hold all the time, as shown for the MA(2) example and its first two autocorrelation summaries, example we started using in the Marin et al. (2011) survey. We further propose a consistency assessment based on the main consistency theorem, namely that the ABC-based estimates of the marginal posterior densities for the parameters should vary little when adding extra components to the summary statistic, densities estimated from simulated data. And that the mean of the resulting summary statistic is indeed one-to-one. This may sound somewhat similar to the stepwise search algorithm of Joyce and Marjoram (2008), but those authors aim at obtaining a vector of summary statistics that is as informative as possible. We also examine the consistency conditions when using an auxiliary model as in indirect inference. For instance, when using an AR(2) auxiliary model for estimating an MA(2) model. And ODEs.
How can one validate the outcome of a validation model? Or can we even imagine validation of this outcome? This was the starting question for the conference I attended in Hannover. Which obviously engaged me to the utmost. Relating to some past experiences like advising a student working on accelerated tests for fighter electronics. And failing to agree with him on validating a model to turn those accelerated tests within a realistic setting. Or reviewing this book on climate simulation three years ago while visiting Monash University. Since I discuss in details below most talks of the day, here is an opportunity to opt away! Continue reading