**J**ust received an email today that our discussion with Judith of Chris Holmes and James Watson’s paper was now published as *Statistical Science 2016, Vol. 31, No. 4, 506-510*… While it is almost identical to the arXiv version, it can be read on-line.

## Archive for Statistical Science

## nonparametric Bayesian clay for robust decision bricks

Posted in Statistics with tags Bayesian robustness, Chris Holmes, Conan Doyle, discussion paper, James Watson, Judith Rousseau, Statistical Science on January 30, 2017 by xi'an## ISBA 2016 [#3]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags ABC, approximate likelihood, Calasetta, ISBA 2016, j-ISBA, loss function, restricted inference, San' Antioco, Sardinia, Statistical Science, summary statistics on June 16, 2016 by xi'anAmong the sessions I attended yesterday, I really liked the one on robustness and model mispecification. Especially the talk by Steve McEachern on Bayesian inference based on insufficient statistics, with a striking graph of the degradation of the Bayes factor as the prior variance increases. I sadly had no time to grab a picture of the graph, which compared this poor performance against a stable rendering when using a proper summary statistic. It clearly relates to our work on ABC model choice, as well as to my worries about the Bayes factor, so this explains why I am quite excited about this notion of restricted inference. In this session, Chris Holmes also summarised his two recent papers on loss-based inference, which I discussed here in a few posts, including the Statistical Science discussion Judith and I wrote recently. I also went to the j-ISBA [section] session which was sadly under-attended, maybe due to too many parallel sessions, maybe due to the lack of unifying statistical theme.

## comments on Watson and Holmes

Posted in Books, pictures, Statistics, Travel with tags Bayesian Analysis, Bayesian robustness, Conan Doyle, decision theory, Dirichlet process, Kullback-Leibler divergence, Saint Giles cemetery, Sherlock Holmes, Statistical Science, University of Oxford on April 1, 2016 by xi'an

“The world is full of obvious things which nobody by any chance ever observes.” The Hound of the Baskervilles

**I**n connection with the incoming publication of James Watson’s and Chris Holmes’ Approximating models and robust decisions in Statistical Science, Judith Rousseau and I wrote a discussion on the paper that has been arXived yesterday.

“Overall, we consider that the calibration of the Kullback-Leibler divergence remains an open problem.” (p.18)

While the paper connects with earlier ones by Chris and coauthors, and possibly *despite* the overall critical tone of the comments!, I really appreciate the renewed interest in robustness advocated in this paper. I was going to write *Bayesian robustness* but to differ from the perspective adopted in the 90’s where robustness was mostly about the prior, I would say this is rather a Bayesian approach to model robustness from a decisional perspective. With definitive innovations like considering the impact of posterior uncertainty over the decision space, uncertainty being defined e.g. in terms of Kullback-Leibler neighbourhoods. Or with a Dirichlet process distribution on the posterior. This may step out of the standard Bayesian approach but it remains of definite interest! (And note that this discussion of ours [reluctantly!] refrained from capitalising on the names of the authors to build easy puns linked with the most Bayesian of all detectives!)

## Harold Jeffreys’ default Bayes factor [for psychologists]

Posted in Books, Statistics, University life with tags Bayesian hypothesis testing, Dickey-Savage ratio, Harold Jeffreys, overfitting, Statistical Science, testing, Theory of Probability on January 16, 2015 by xi'an*“One of Jeffr**eys’ goals was to create default Bayes factors by using prior distributions that obeyed a series of general desiderata.”*

**T**he paper *Harold Jeffreys’s default Bayes factor hypothesis tests: explanation, extension, and application in Psychology* by Alexander Ly, Josine Verhagen, and Eric-Jan Wagenmakers is both a survey and a reinterpretation *cum* explanation of Harold Jeffreys‘ views on testing. At about the same time, I received a copy from Alexander and a copy from the journal it had been submitted to! This work starts with a short historical entry on Jeffreys’ work and career, which includes four of his principles, quoted *verbatim* from the paper:

- “scientific progress depends primarily on induction”;
- “in order to formalize induction one requires a logic of partial belief” [enters the Bayesian paradigm];
- “scientific hypotheses can be assigned prior plausibility in accordance with their complexity” [a.k.a., Occam’s razor];
- “classical “Fisherian” p-values are inadequate for the purpose of hypothesis testing”.

“The choice of π(σ) therefore irrelevant for the Bayes factor as long as we use the same weighting function in both models”

A very relevant point made by the authors is that Jeffreys *only* considered embedded or nested hypotheses, a fact that allows for having common parameters between models and hence some form of reference prior. Even though (a) I dislike the notion of “common” parameters and (b) I do not think it is entirely legit (I was going to write proper!) from a mathematical viewpoint to use the same (improper) prior on both sides, as discussed in our Statistical Science paper. And in our most recent alternative proposal. The most delicate issue however is to derive a reference prior on the parameter of interest, which is *fixed* under the null and *unknown* under the alternative. Hence preventing the use of improper priors. Jeffreys tried to calibrate the corresponding prior by imposing asymptotic consistency under the alternative. And exact indeterminacy under “completely uninformative” data. Unfortunately, this is not a well-defined notion. In the normal example, the authors recall and follow the proposal of Jeffreys to use an improper prior π(σ)∝1/σ on the nuisance parameter and argue in his defence the quote above. I find this argument quite weak because suddenly the prior on σ becomes a *weighting function..*. A notion foreign to the Bayesian cosmology. If we use an improper prior for π(σ), the marginal likelihood on the data is no longer a probability density and I do not buy the argument that one should use the *same* measure with the *same* constant both on σ alone [for the nested hypothesis] and on the σ part of (μ,σ) [for the nesting hypothesis]. We are considering two spaces with different dimensions and hence orthogonal measures. This quote thus sounds more like wishful thinking than like a justification. Similarly, the assumption of independence between δ=μ/σ and σ does not make sense for σ-finite measures. Note that the authors later point out that (a) the posterior on σ varies between models despite using the *same* data [which shows that the parameter σ is far from common to both models!] and (b) the [testing] Cauchy prior on δ is only useful for the testing part and should be replaced with another [estimation] prior when the model has been selected. Which may end up as a backfiring argument about this default choice.

“Each updated weighting function should be interpreted as a posterior in estimating σ within their own context, the model.”

The re-derivation of Jeffreys’ conclusion that a Cauchy prior should be used on δ=μ/σ makes it clear that this choice only proceeds from an imperative of fat tails in the prior, without solving the calibration of the Cauchy scale. (Given the now-available modern computing tools, it would be nice to see the impact of this scale γ on the numerical value of the Bayes factor.) And maybe it also proceeds from a “hidden agenda” to achieve a Bayes factor that *solely* depends on the *t* statistic. Although this does not sound like a compelling reason to me, since the *t* statistic is not sufficient in this setting.

In a differently interesting way, the authors mention the Savage-Dickey ratio (p.16) as a way to represent the Bayes factor for nested models, without necessarily perceiving the mathematical difficulty with this ratio that we pointed out a few years ago. For instance, in the psychology example processed in the paper, the test is between δ=0 and δ≥0; however, if I set π(δ=0)=0 under the alternative prior, which should not matter *[from a measure-theoretic perspective where the density is uniquely defined almost everywhere]*, the Savage-Dickey representation of the Bayes factor returns zero, instead of 9.18!

“In general, the fact that different priors result in different Bayes factors should not come as a surprise.”

The second example detailed in the paper is the test for a zero Gaussian correlation. This is a sort of “ideal case” in that the parameter of interest is between -1 and 1, hence makes the choice of a uniform U(-1,1) easy or easier to argue. Furthermore, the setting is also “ideal” in that the Bayes factor simplifies down into a marginal over the sample correlation only, under the usual Jeffreys priors on means and variances. So we have a second case where the frequentist statistic behind the frequentist test[ing procedure] is also the single (and insufficient) part of the data used in the Bayesian test[ing procedure]. Once again, we are in a setting where Bayesian and frequentist answers are in one-to-one correspondence (at least for a fixed sample size). And where the Bayes factor allows for a closed form through hypergeometric functions. Even in the one-sided case. (This is a result obtained by the authors, not by Jeffreys who, as the proper physicist he was, obtained approximations that are remarkably accurate!)

“The fact that the Bayes factor is independent of the intention with which the data have been collected is of considerable practical importance.”

The authors have a side argument in this section in favour of the Bayes factor against the p-value, namely that the “Bayes factor does not depend on the sampling plan” (p.29), but I find this fairly weak (or tongue in cheek) as the Bayes factor *does* depend on the sampling distribution imposed on top of the data. It appears that the argument is mostly used to defend sequential testing.

“The Bayes factor (…) balances the tension between parsimony and goodness of fit, (…) against overfitting the data.”

*In fine*, I liked very much this re-reading of Jeffreys’ approach to testing, maybe the more because I now think we should get away from it! I am not certain it will help in convincing psychologists to adopt Bayes factors for assessing their experiments as it may instead frighten them away. And it does not bring an answer to the vexing issue of the relevance of point null hypotheses. But it constitutes a lucid and innovative of the major advance represented by Jeffreys’ formalisation of Bayesian testing.

## did I mean endemic? [pardon my French!]

Posted in Books, Statistics, University life with tags Air France, Bayesian Analysis, censoring, endemic, Glasgow, guest editors, information theory, Larry Wasserman, Robins-Wasserman paradox, Statistical Science, translation, Ubiquitous Chip on June 26, 2014 by xi'an**D**eborah Mayo wrote a Saturday night special column on our Big Bayes stories issue in *Statistical Science*. She (predictably?) focussed on the critical discussions, esp. David Hand’s most forceful arguments where he essentially considers that, due to our (special issue editors’) selection of successful stories, we biased the debate by providing a “one-sided” story. And that we or the editor of *Statistical Science* should also have included frequentist stories. To which Deborah points out that demonstrating that “only” a frequentist solution is available may be beyond the possible. And still, I could think of partial information and partial inference problems like the “paradox” raised by Jamie Robbins and Larry Wasserman in the past years. (Not the normalising constant paradox but the one about censoring.) Anyway, the goal of this special issue was to provide a range of realistic illustrations where Bayesian analysis was a most reasonable approach, not to raise the Bayesian flag against other perspectives: in an ideal world it would have been more interesting to get discussants produce alternative analyses bypassing the Bayesian modelling but obviously discussants only have a limited amount of time to dedicate to their discussion(s) and the problems were complex enough to deter any attempt in this direction.

**A**s an aside and in explanation of the cryptic title of this post, Deborah wonders at my use of *endemic* in the preface and at the possible mis-translation from the French. I did mean *endemic* (and *endémique*) in a half-joking reference to a disease one cannot completely get rid of. At least in French, the term extends beyond diseases, but presumably *pervasive* would have been less confusing… Or *ubiquitous* (as in Ubiquitous Chip for those with Glaswegian ties!). She also expresses “surprise at the choice of name for the special issue. Incidentally, the “big” refers to the bigness of the problem, not big data. Not sure about “stories”.” Maybe another occurrence of lost in translation… I had indeed no intent of connection with the “big” of “Big Data”, but wanted to convey the notion of a big as in major problem. And of a story explaining why the problem was considered and how the authors reached a satisfactory analysis. The story of the Air France Rio-Paris crash resolution is representative of that intent. (Hence the explanation for the above picture.)

## big Bayes stories

Posted in Books, Statistics, University life with tags Adrian Smith, Air France, Baltic salmon, Bayesian data analysis, big Bayes, galaxy formation, HIV, lynbya, population predictions, quasars, Sharon McGrayne, special issue, Statistical Science, the theory that would not die, United Nations on July 29, 2013 by xi'an*(The following is our preface to the incoming “Big Bayes stories” special issue of Statistical Science, edited by Sharon McGrayne, Kerrie Mengersen and myself.)*

**B**ayesian statistics is now endemic in many areas of scientic, business and social research. Founded a quarter of a millenium ago, the enabling theory, models and computational tools have expanded exponentially in the past thirty years. So what is it that makes this approach so popular in practice? Now that Bayesian statistics has “grown up”, what has it got to show for it- self? In particular, what real-life problems has it really solved? A number of events motivated us to ask these questions: a conference in honour of Adrian Smith, one of the founders of modern Bayesian Statistics, which showcased a range of research emanating from his seminal work in the field, and the impressive book by Sharon McGrayne, ** the theory that would not die**. At a café in Paris in 2011, we conceived the idea of gathering a similar collection of “Big Bayes stories”, that would demonstrate the appeal of adopting a Bayesian modelling approach in practice. That is, we wanted to collect real cases in which a Bayesian approach had made a significant difference, either in addressing problems that could not be analysed otherwise, or in generating a new or deeper understanding of the data and the associated real-life problem.

**A**fter submitting this proposal to Jon Wellner, editor of Statistical Science, and obtaining his encouragement and support, we made a call for proposals. We received around 30 submissions (for which authors are to be warmly thanked!) and after a regular review process by both Bayesian and non-Bayesian referees (who are also deeply thanked), we ended up with 17 papers that reflected the type of stories we had hoped to hear. Sharon McGrayne, then read each paper with the utmost attention and provided helpful and encouraging comments on all. Sharon became part the editorial team in acknowledgement of this substantial editing contribution, which has made the stories much more enjoyable. In addition, referees who handled several submissions were asked to contribute discussions about the stories and some of them managed to find additional time for this task, providing yet another perspective on the stories..

**A**s can be gathered from the table of contents, the spectrum of applications ranges across astronomy, epidemiology, ecology and demography, with the special case of the Air France wreckage story also reported in the paper- back edition of the theory that would not die. What made those cases so well suited for a Bayesian solution? In some situations, the prior or the expert opinion was crucial; in others, the complexity of the data model called for a hierarchical decomposition naturally provided in a Bayesian framework; and others involved many actors, perspectives and data sources that only Bayesian networks could aggregate. Now, before or (better) after reading those stories, one may wonder whether or not the “plus” brought by the Bayesian paradigm was truly significant. We think they did, at one level or another of the statistical analysis, while we acknowledge that in several cases other statistical perspectives or even other disciplines could have brought another solution, but presumably at a higher cost.

**N**ow, before or (better) after reading those stories, one may wonder whether or not the \plus” brought by the Bayesian paradigm was truly significant. We think it did, at one level or another of the statistical analysis, while we acknowledge that in several cases other statistical perspectives or even other disciplines could have provided another solution, but presumably at a higher cost. We think this collection of papers constitutes a worthy tribute to the maturity of the Bayesian paradigm, appropriate for commemorating the 250th anniversary of the publication of Bayes’ Essay towards solving a Problem in the Doctrine of Chances. We thus hope you will enjoy those stories, whether or not Bayesiana is your statistical republic.

## back from down under

Posted in Books, pictures, R, Statistics, Travel, University life with tags ACM Transactions on Modeling and Computer Simulation, AMSI, Australia, book reviews, BRAG, CHANCE, George Casella, Kerrie Mengersen, Monash, QUT, R, R exam, Statistical Science, TOMACS, vacations on August 30, 2012 by xi'an**A**fter a sunny weekend to unpack and unwind, I am now back to my normal schedule, on my way to Paris-Dauphine for an R (second-chance) exam. Except for confusing my turn signal for my wiper, thanks to two weeks of intensive driving in four Australian states!, things are thus back to “normal”, meaning that I have enough of a control of my time to handle both daily chores like the R exam and long-term projects. Including the special issues of Statistical Science, TOMACS, and CHANCE (reviewing all books of George Casella *in memoriam*). And the organisation of MCMSki 4, definitely taking place in Chamonix on January 6-8, 2014, hopefully under the sponsorship of the newly created BayesComp section of ISBA. And enough broadband to check my usual sites and to blog *ad nauseam*.

**T**his trip to Australia, along the AMSI Lectures as well as the longer visits to Monash and QUT, has been quite an exciting time, with many people met and ideas discussed. I came back with a (highly positive) impression of Australian universities as very active places, just along my impression of Australia being a very dynamic and thriving country, far far away from the European recession. I was particularly impressed by the number of students within Kerrie Mengersen’s BRAG group, when we did held discussions in classrooms that felt full like a regular undergrad class! Those discussions and meetings set me towards a few new projects along the themes of mixture estimation and model choice, as well as convergence assessment. During this trip, I however also felt the lack of long “free times” I have gotten used to, thanks to the IUF chair support, where I can pursue a given problem for a few hours without interruption. Which means that I did not work as much as I wanted to during this tour and will certainly avoid such multiple-step trips in a near future. Nonetheless, overall, the own under” experience was quite worth it! (Even without considering the two weeks of vacations I squeezed in the middle.)

**B**ack to “normal” also means I already had two long delays caused by suicides on my train line…