## Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)

This new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian  principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it). [Warning: this is a rather long review and post, so readers may chose to opt out now!]

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

## Approximate Bayesian model choice

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , , , on March 17, 2014 by xi'an

The above is the running head of the arXived paper with full title “Implications of  uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al.” by Oaks, Linkem and Sukuraman. That I (again) read in the plane to Montréal (third one in this series!, and last because I also watched the Japanese psycho-thriller Midsummer’s Equation featuring a physicist turned detective in one of many TV episodes. I just found some common features with The Devotion of Suspect X, only to discover now that the book has been turned into another episode in the series.)

“Here we demonstrate that the approach of Hickerson et al. (2014) is dangerous in the sense that the empirically-derived priors often exclude from consideration the true values of the models’ parameters. On a more fundamental level, we question the value of adopting an empirical Bayesian stance for this model-choice problem, because it can mislead model posterior probabilities, which are inherently measures of belief in the models after prior knowledge is updated by the data.”

This paper actually is a reply to Hickerson et al. (2014, Evolution), which is itself a reply to an earlier paper by Oaks et al. (2013, Evolution). [Warning: I did not check those earlier references!] The authors object to the use of “narrow, empirically informed uniform priors” for the reason reproduced in the above quote. In connection with the msBayes of Huang et al. (2011, BMC Bioinformatics). The discussion is less about ABC used for model choice and posterior probabilities of models and more about the impact of vague priors, Oaks et al. (2013) arguing that this leads to a bias towards models with less parameters, a “statistical issue” in their words, while Hickerson et al. (2014) think this is due to msBayes way of selecting models and their parameters at random.

“…it is difficult to choose a uniformly distributed prior on divergence times that is broad enough to confidently contain the true values of parameters while being narrow enough to avoid spurious support of models with less parameter space.”

So quite an interesting debate that takes us in fine far away from the usual worries about ABC model choice! We are more at the level empirical versus natural Bayes, seen in the literature of the 80’s. (The meaning of empirical Bayes is not that clear in the early pages as the authors seem to involve any method using the data “twice”.) I actually do not remember reading papers about the formal properties of model choice done through classical empirical Bayes techniques. Except the special case of Aitkin’s (1991,2009) integrated likelihood. Which is essentially the analysis performed on the coin toy example (p.7)

“…models with more divergence parameters will be forced to integrate over much greater parameter space, all with equal prior density, and much of it with low likelihood.”

The above argument is an interesting rephrasing of Lindley’s paradox, which I cannot dispute, but of course it does not solve the fundamental issue of how to choose the prior away from vague uniform priors… I also like the quote “the estimated posterior probability of a model is a single value (rather than a distribution) lacking a measure of posterior uncertainty” as this is an issue on which we are currently working. I fully agree with the statement and we think an alternative assessment to posterior probabilities could be more appropriate for model selection in ABC settings (paper soon to come, hopefully!).

## my DICussion

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on September 25, 2013 by xi'an

Following the Re-Reading of Spiegelhalter et al. (2002) by David at the RSS Annual Conference a few weeks ago, and my invited discussion there, I was asked to contribute a written discussion to Series B, a request obviously impossible to refuse!

The main issue with DIC is the question of its worth for Bayesian decision analysis (since I doubt there are many proponents of DIC outside the Bayesian community). The appeal of DIC is, I presume, to deliver a single summary per model under comparison and to allow therefore for a complete ranking of those models. I however object at the worth of simplicity for simplicity’s sake: models are complex (albeit less than reality) and their usages are complex as well. To consider that model A is to be preferred upon model B just because DIC(A)=1228 < DiC(B)=1237 is a mimicry of the complex mechanisms at play behind model choice, especially given the wealth of information provided by a Bayesian framework. (Non-Bayesian paradigms are more familiar with procedures based on a single estimator value.) And to abstain from accounting for the significance of the difference between DIC(A) and DIC(B) clearly makes matters worse.

This is not even discussing the stylised setting where one model is considered as “true” and where procedures are compared by their ability to recover the “truth”. David Spiegelhalter repeatedly mentioned during his talk that he was not interested in this. This stance brings another objection, though, namely that models can only be compared against their predictive abilities, which DIC seems unable to capture. Once again, what is needed is a multi-factor and all-encompassing criterion that evaluates the predictive models in terms of their recovery of some features of the phenomenon under study. Or of the process being conducted. (Even stooping down to a one-dimensional loss function that is supposed to summarise the purpose of the model comparison does not produce anything close to the DIC function.)

Obviously, considering that asymptotic consistency is of no importance whatsoever (as repeated by David in Newcastle) avoids some embarrassing questions, except the one about the true purpose of statistical models and procedures. How can they be compared if no model is true and if accumulating data from a given model is not meaningful? How can simulation be conducted in such a barren landscape?  I find it the more difficult to accept this minimalist attitude that models are truly used as if they were or could be true, at several stages in the process. It also prevents the study of the criterion under model misspecification, which would clearly be of interest.

Another point, already exposed in our 2006 Bayesian Analysis paper with Gilles Celeux, Florence Forbes, and Mike Titterington, is that there is no unique driving principle for constructing DICs. In that paper, we examined eight different and natural versions of DIC for mixture models, resulting in highly diverging values for DIC and the effective dimension of the parameter, I believe that such a lack of focus is bound to reappear in any multimodal setting and fear that the answer about (eight) different focus on what matters in the model is too cursory and lacks direction for the hapless practitioner.

My final remark about DIC is that it shares very much the same perspective as Murray Aitkin’s integrated likelihood, Both Aitkin (1991, 2009) and Spiegelhalter et al. (2002) consider a posterior distribution on the likelihood function, taken as a function of the parameter but omitting the delicate fact that it also depends on the observable and hence does not exist a priori. We wrote a detailed review of Aitkin’s (2009) book, where most of the criticisms equally apply to DIC, and I will not repeat them here, except for pointing out that it escapes the Bayesian framework (and thus requires even more its own justifications).

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on September 3, 2013 by xi'an

Today, I attended the RSS Annual Conference in Newcastle-upon-Tyne. For one thing, I ran a Memorial session in memory of George Casella, with my (and his) friends Jim Hobert and Elias Moreno as speakers. (The session was well-attended if not overwhelmingly so.) For another thing, the RSS decided to have the DIC Read Paper by David Spiegelhalter, Nicky Best, Brad Carlin and Angelika van der Linde Bayesian measures of model complexity and fit re-Read, and I was asked to re-discuss the 2002 paper. Here are the slides of my discussion, borrowing from the 2006 Bayesian Analysis paper with Gilles Celeux, Florence Forbes, and Mike Titterington where we examined eight different versions of DIC for mixture models. (I refrained from using the title “snow white and the seven DICs” for a slide…) I also borrowed from our recent discussion of Murray Aitkin’s (2009) book. The other discussant was Elias Moreno, who focussed on consistency issues. (More on this and David Spiegelhalter’s defence in a few posts!) This was the first time I was giving a talk on a basketball court (I once gave an exam there!)

Posted in Books, Statistics with tags , , , , , , , , on March 23, 2013 by xi'an

This morning session at the workshop Recent Advances in statistical inference: theory and case studies was a true blessing for anyone working in Bayesian model choice! And it did give me ideas to complete my current paper on the Jeffreys-Lindley paradox, and more. Attending the talks in the historical Gioachino Rossini room of the fabulous Café Pedrocchi with the Italian spring blue sky as a background surely helped! (It is only beaten by this room of Ca’Foscari overlooking the Gran Canale where we had a workshop last Fall…)

First, Phil Dawid gave a talk on his current work with Monica Musio (who gave a preliminary talk on this in Venezia last fall) on the use of new score functions to compare statistical models. While the regular Bayes factor is based on the log score, comparing the logs of the predictives at the observed data, different functions of the predictive q can be used, like the Hyvärinen score

$S(x,q)=\Delta\sqrt{q(x)}\big/\sqrt{q(x)}$

which offers the immense advantage of being independent of the normalising constant and hence can also be used for improper priors. As written above, a very deep finding that could at last allow for the comparison of models based on improper priors without requiring convoluted constructions (see below) to make the “constants meet”. I first thought the technique was suffering from the same shortcoming as Murray Aitkin’s integrated likelihood, but I eventually figured out (where) I was wrong!

The second talk was given by Ed George, who spoke on his recent research with Veronika Rocková dealing with variable selection via an EM algorithm that proceeds much much faster to the optimal collection of variables, when compared with the DMVS solution of George and McCulloch (JASA, 1993). (I remember discussing this paper with Ed in Laramie during the IMS meeting in the summer of 1993.) This resurgence of the EM algorithm in this framework is both surprising (as the missing data structure represented by the variable indicators could have been exploited much earlier) and exciting, because it opens a new way to explore the most likely models in this variable selection setting and to eventually produce the median model of Berger and Barbieri (Annals of Statistics, 2004). In addition, this approach allows for a fast comparison of prior modellings on the missing variable indicators, showing in some examples a definitive improvement brought by a Markov random field structure. Given that it also produces a marginal posterior density on the indicators, values of hyperparameters can be assessed, escaping the Jeffreys-Lindley paradox (which was clearly a central piece of today’s talks and discussions). I would like to see more details on the MRF part, as I wonder which structure is part of the input and which one is part of the inference.

The third talk of the morning was Susie Bayarri’s, about a collection of desiderata or criteria for building an objective prior in model comparison and achieving a manageable closed-form solution in the case of the normal linear model. While I somehow disagree with the information criterion, which states that the divergence of the likelihood ratio should imply a corresponding divergence of the Bayes factor. While I definitely agree with the invariance argument leading to using the same (improper) prior over parameters common to models under comparison, this may sound too much of a trick to outsiders, especially when accounting for the score solution of Dawid and Musio. Overall, though, I liked the outcome of a coherence reference solution for linear models that could clearly be used as a default in this setting, esp. given the availability of an R package called BayesVarSel. (Even if I also like our simpler solution developped in the incoming edition of Bayesian Core, also available in the bayess R package!) In his discussion, Guido Consonni highlighted the philosophical problem of considering “common paramaters”, a perspective I completely subscribe to, even though I think all that matters is the justification of having a common prior over formally equivalent parameters, even though this may sound like a pedantic distinction to many!

Due to a meeting of the scientific committee of the incoming O’Bayes 2013 meeting (in Duke, December, more about this soon!), whose most members were attending this workshop, I missed the beginning of Alan Aggresti’s talk and could not catch up with the central problem he was addressing (the pianist on the street outside started pounding on his instrument as if intent to break it apart!). A pity as problems with contingency tables are certainly of interest to me… By the end of Alan’s talk, I wished someone would shoot the pianist playing outside (even though he was reasonably gifted) as I had gotten a major headache from his background noise. Following Noel Cressie’s talk proved just as difficult, although I could see his point in comparing very diverse predictors for big Data problems without much of a model structure and even less of a  and I decided to call the day off, despite wishing to stay for Eduardo Gutiérrez-Pena’s talk on conjugate predictives and entropies which definitely interested me… Too bad really (blame the pianist!)

## inherent difficulties of non-Bayesian likelihood-based inference

Posted in Books, Statistics, University life with tags , , , , , , on November 8, 2011 by xi'an

Following a series of rejections of our discussion of Murray Aitkin’s book, Statistical Inference, discussion written with Andrew Gelman and Judith Rousseau, by the journals Bayesian Analysis, JASA (Book Reviews), and Electronic Journal of Statistics, we have received an encouraging review from the journal Statistics and Risk Modeling (with Applications on Finance and Insurance), formerly Statistics and Decisions. Since the main request was to broaden our perspective, we revised the paper towards a more global analysis of the issues raised by Murray’s book. For a start, the title got changed from the maybe provocative “Do we need an integrated Bayesian/likelihood inference?” into the slightly archaic “Inherent Difficulties of Non-Bayesian Likelihood-based Inference, as Revealed by an Examination of a Recent Book by Aitkin“. If only to explain why it is broader than a mere book review… For another, the paper also addresses similar criticisms to the deviance information criterion (DIC). Hopefully,  this revision will be considered more positively and turn into a discussion paper about this unBayesian use of Bayesian tools…