## potentially relevant

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on March 14, 2012 by xi'an

This week, freshly back from Roma, I got the reviews on our paper “Relevant statistics for Bayesian model choice” from Series B. The comments are detailed and mostly to the point, expressing concern about the relevance of the paper for statistical methodology as the major issue.  We are thus asked for a revision making a much better connection with ABC methodology.

This is not an unexpected outcome, from my point of view, because the paper is indeed quite theoretical and the mathematical assumptions required to obtain the convergence theorems are rather overwhelming… Meaning that in practical cases they cannot truly be checked. However, I think we can eventually address those concerns for two distinct reasons: first, the paper comes as a third step in a series of papers where we first identified a sufficiency property, then realised that this property was actually quite a rare occurrence, and finally made a theoretical advance as to when is a summary statistic enough (i.e. “sufficient” in the standard sense of the term!)  to conduct model choice, with a clear answer that the mean ranges of the summary statistic under each model could not intersect.  Second, my own personal view is that those assumptions needed for convergence are not of the highest importance for statistical practice (even though they are needed in the paper!) and thus that, from a methodological point of view, only the conclusion should be taken into account. It is then rather straightforward to come up with (quick-and-dirty) simulation devices to check whether a summary statistic behaves differently under both models, taking advantage of the reference table already available (instead of having to run Monte Carlo experiments with ABC basis)…

One of the comments was that maybe Bayes factors were not appropriate for conducting model choice, thus making the whole derivation irrelevant. This is a possible perspective but it can be objected that Bayes factors and posterior probabilities are used in conjunction with ABC in dozens of genetic papers. Further arguments are provided in the various replies to both of Templeton’s radical criticisms. That more empirical and model-based assessments also are available is quite correct, as demonstrated in the multicriterion approach of Olli Ratmann and co-authors. This is simply another approach, not followed by most geneticists so far…

## MCMC with control variates

Posted in Books, Statistics, University life with tags , , , , , , , , , , on February 17, 2012 by xi'an

In the latest issue of JRSS Series B (74(1), Jan, 2012), I just noticed that no paper is “from my time” as co-editor, i.e. that all of them have been submitted after I completed my term in Jan. 2010. Given the two year delay, this is not that surprising, but it also means I can make comments on some papers w/o reservation! A paper I had seen earlier (as a reader, not as an editor nor as a referee!) is Petros Dellaportas’ and Ioannis Kontoyiannis’ Control variates for estimation based on  reversible Markov chain Monte Carlo samplers. The idea is one of post-processing MCMC output, by stabilising the empirical average via control variates. There are two difficulties, one in finding control variates, i.e. functions $\Psi(\cdot)$ with zero expectation under the target distribution, and another one in estimating the optimal coefficient in a consistent way. The paper solves the first difficulty by using the Poisson equation, namely that G(x)-KG(x) has zero expectation under the stationary distribution associated with the Markov kernel K. Therefore, if KG can be computed in closed form, this is a generic control variate taking advantage of the MCMC algorithm. Of course, the above if is a big if: it seems difficult to find closed form solutions when using a Metropolis-Hastings algorithm for instance and the paper only contains illustrations within the conjugate prior/Gibbs sampling framework. The second difficulty is also met by Dellaportas and Kontoyiannis, who show that the asymptotic variance of the resulting central limit can be equal to zero in some cases.

## comments in ABC PhD course

Posted in pictures, Statistics, University life with tags , , , , , , , , , on February 13, 2012 by xi'an

Following my reading the discussions of the Read Paper by Fearnhead and Prangle, I included some of their points in my course this morning. Which ended up with me spending the whole two hours on this topic (and finally getting a grasp on calibration!). Here is [hopefully] the final version of the slides.

## Read Paper at the Royal Statistical Society

Posted in Statistics, Travel, University life with tags , , , , , , on December 14, 2011 by xi'an

This afternoon, I will attend the Read Paper session in London, presented by Paul Fearnhead and Dennis Prangle on semi-automatic ABC. I have already commented the paper (as a referee, external examiner and blogger!) and provided my slides for our local pre-ordinary meeting at CREST, so here is my written discussion (maybe to be turned into discussions due to its length!). (I just hope my flight from the US won’t be cancelled or overly delayed…)

## Catching up faster by switching sooner

Posted in R, Statistics, University life with tags , , , , , , , , on October 26, 2011 by xi'an

Here is our discussion (with Nicolas Chopin) of the Read Paper of last Wednesday by T. van Erven, P. Grünwald and S. de Rooij (Centrum voor Wiskunde en Informatica, Amsterdam), entitled Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the Akaike information criterion–Bayesian information criterion dilemma. It is still available for written discussions, to be published in Series B. Even though the topic is quite tangential to our interests, the fact that the authors evolve in a Bayesian environment called for the following (my main contribution being in pointing out that the procedure is not Bayesian by failing to incorporate the switch in the predictive (6), hence using the same data for all models under competition…):

Figure 1 - Bayes factors of Model 2 vs.~Model 1 (gray line) and Model 3 vs.~Model 1 (dark line), plotted against the number of observations, i.e. of iterations, when comparing three stochastic volatility models; see Chopin et al. (2011) for full details.

This paper is an interesting attempt at a particularly important problem. We nonetheless believe more classical tools should be used instead if models are truly relevant in the inference led by the authors: Figure 1, reproduced from Chopin et al. (2011), plots [against time] the Bayes factors of Models 2 and 3 vs. Model 1, where all models are state-space models of increasing complexity, fitted to some real data. In this context, one often observes that more complex models need more time to “ascertain themselves”. On the other hand, even BMA based prediction is a very challenging computational problem (the only generic solution currently being the SMC² algorithm of the aforementioned paper), and we believe that the current proposed predictive strategy will remain too computationally expensive for practical use for nonlinear state-space models.

For other classes of models, since the provable methods put forward by this paper are based on “frozen strategies”, which are hard to defend from a modelling perspective, and since the more reasonable “basic switch” strategy seems to perform as well numerically, we would be curious to see how the proposed methods compare to predictive distributions obtained from genuine Bayesian models. A true change point model for instance would generate a coherent prediction strategy, which is not equivalent to the basic switch strategy. (Indeed, for one thing, the proposal made by the authors utilises the whole past to compute the switching probabilities, rather than allocating the proper portion of the data to the relevant model. In this sense, the proposal is “using the data [at least] twice” in a pseudo-Bayesian setting, similar to Aitkin’s, 1991.) More generally, the authors seem to focus on situations where the true generative process is a non-parametric class, and the completed models is an infinite sequence of richer and richer—but also of more and more complex—parametric models, which is a very sensible set-up in practice. Then, we wonder whether or not it would make more sense to set the prior distribution over the switch parameter s in such a way that (a) switches only occurs from one model to another model with greater complexity and (b) the number of switches is infinite.

For ABC readers, note the future Read Paper meeting on December 14 by Paul Fearnhead and Dennis Prangle.