Archive for model choice
10:45 Oliver Ratmann (Duke University and Imperial College) – “Approximate Bayesian Computation based on summaries with frequency properties”
Approximate Bayesian Computation (ABC) has quickly become a valuable tool in many applied fields, but the statistical properties obtained by choosing a particular summary, distance function and error threshold are poorly understood. In an effort to better understand the effect of these ABC tuning parameters, we consider summaries that are associated with empirical distribution functions. These frequency properties of summaries suggest what kind of distance function are appropriate, and the validity of the choice of summaries can be assessed on the fly during Monte Carlo simulations. Among valid choices, uniformly most powerful distances can be shown to optimize the ABC acceptance probability. Considering the binding function between the ABC model and the frequency model of the summaries, we can characterize the asymptotic consistency of the ABC maximum-likelhood estimate in general situations. We provide examples from phylogenetics and dynamical systems to demonstrate that empirical distribution functions of summaries can often be obtained without expensive re-simulations, so that the above theoretical results are applicable in a broad set of applications. In part, this work will be illustrated on fitting phylodynamic models that capture the evolution and ecology of interpandemic influenza A (H3N2) to incidence time series and the phylogeny of H3N2′s immunodominant haemagglutinin gene.
I however benefited enormously from hearing the talk again and also from discussing the fundamentals of his approach before and after the talk (in the nearest Aussie pub!). Olli’s approach is (once again!) rather iconoclastic in that he presents ABC as a testing procedure, using frequentist tests and concepts to build an optimal acceptance condition. Since he manipulates several error terms simultaneously (as before), he needs to address the issue of multiple testing but, thanks to a switch between acceptance and rejection, null and alternative, the individual α-level tests get turned into a global α-level test.
The afternoon on model choice at the London School of Hygiene (!) and Tropical Medicine was worth the short trip from Paris, especially when the weather in London felt like real summer: walking in the streets was a real treat! The talks were also interesting in that the emphasis was off-key from my usual statistics talks and thus required more focus from me. The first talk by Stijn Vansteelandt emphasized (very nicely) the role of confounders and exposure in causal inference in ways that were novel to me (although it seems in the end that a proper graphical modelling of all quantities involved in the process would allow for a standard statistical analysis). I also had troubles envisioning the Bayesian version of the approach, although Stijn referred to a recent paper by Wang et al. While Stijn has a joint paper in the Series B that just arrived on my desk, this talk is more related to appear in Statistical Methodology in Medical Research (The second talk was mine and presumably too technical in that I should have gotten rid of the new mathematical assumptions [A1]-[A4] altogether.) The third was a fascinating statistical analysis by Doug Speed of an important genetic heritability paper, by Yang et al., where he took the assumptions of the model one at a time to see how they were impacting the conclusions and found that none was to blame. The fourth and final talk by David Clayton covered the role of link functions in GLMs applied to epidemiological models, in connection with older papers from the 1990′s, to conclude that the choice of the link function mattered for the statistical properties of the variable selection procedures, which I found a bit puzzling based on my (limited) econometric intuition that all link functions lead to consistent pseudo-models. In any case, this was a fairly valuable meeting, furthermore attended by a very large audience.
As mentioned in the latest post on ABC, I am giving a short doctoral course on ABC methods and convergence at CREST next week. I have now made a preliminary collection of my slides (plus a few from Jean-Michel Marin’s), available on slideshare (as ABC in Roma, because I am also giving the course in Roma, next month, with an R lab on top of it!):
and I did manage to go over the book by Gouriéroux and Monfort on indirect inference over the weekend. I still need to beef up the slides before the course starts next Thursday! (The core version of the slides is actually from the course I gave in Wharton more than a year ago.)
Today, I am attending a workshop on the use of graphics processing units in Statistics in Warwick, supported by CRiSM, presenting our recent works with Randal Douc, Pierre Jacob and Murray Smith. (I will use the same slides as in Telecom two months ago, hopefully avoiding the loss of integral and summation signs this time!) Pierre Jacob will talk about Wang-Landau.
Yesterday, we had a meeting of our EMILE network on statistics for population genetics (in Montpellier) and we were discussing our respective recent advances in ABC model choice. One of our colleagues mentioned the constant request (from referees) to include the post-ABC processing devised by Fagundes et al. in their 2007 ABC paper. (This paper contains a wealth of statistical innovations, but I only focus here on this post-checking device.)
The method centres around the above figure, with the attached caption
Fig. 4. Empirical distributions of the estimated relative probabilities of the AFREG model when the AFREG (solid line), MREBIG (dashed line), and ASEG (dotted line) models are the true models. Here, we simulated 1,000 data sets under the AFREG, MREBIG, and ASEG models by drawing random parameter values from the priors. The density estimates of the three models at the AFREG posterior probability = 0.781 (vertical line) were used to compute the probability that AFREG is the correct model given our observation that PAFREG = 0.781. This probability is equal to 0.817.
which aims at computing a p-value based on the ABC estimate of the posterior probability of a model.
I am somehow uncertain about the added value of this computation and about the paradox of the sentence “the probability that AFREG is the correct model [given] the AFREG posterior probability (..) is equal to 0.817″… If I understand correctly the approach followed by Fagundes et al., they simulate samples from the joint distribution over parameter and (pseudo-)data conditional on each model, then approximate the density of the [ABC estimated] posterior probabilities of the AFREG model by a non parametric density estimate, presumably density(), which means in Bayesian terms the marginal likelihoods (or evidences) of the posterior probability of the AFREG model under each of the models under comparison. The “probability that AFREG is the correct model given our observation that PAFREG = 0.781″ is then completely correct in the sense that it is truly a posterior probability for this model based on the sole observation of the transform (or statistic) of the data x equal to PAFREG(x). However, if we only look at the Bayesian perspective and do not consider the computational aspects, there is no rationale in moving from the data (or from the summary statistics) to a single statistic equal to PAFREG(x), as this induces a loss of information. (Furthermore, it seems to me that the answer is not invariant against the choice of the model whose posterior probability is computed, if more than two models are compared. In other words, the posterior probability of the AFREG model given the sole observation of PAFREG(x). is not necessarily the same as the posterior probability of the AFREG model given the sole observation of PASEG(x)…) Although this is not at all advised by the paper, it seems to me that some users of this processing opt instead for simulations of the parameter taken from the ABC posterior, which amounts to using the “data twice“, i.e. the squared likelihood instead of the likelihood… So, while the procedure is formally correct (despite Templeton’s arguments against it), it has no added value. Obviously, one could alternatively argue that the computational precision in approximating the marginal likelihoods is higher with the (non-parametric) solution based on PAFREG(x) than the (ABC) solution based on x, but this is yet to be demonstrated (and weighted against the information loss).
Just as a side remark on the polychotomous logistic regression approximation to the posterior probabilities introduced in Fagundes et al.: the idea is quite enticing, as a statistical regularisation of ABC simulations. It could be exploited further by using a standard model selection strategy in order to pick the summary statistics that are truly contributed to explain the model index.