Following my earlier comments on Alexander Ly, Josine Verhagen, and Eric-Jan Wagenmakers, from Amsterdam, Joris Mulder, a special issue editor of the Journal of Mathematical Psychology, kindly asked me for a written discussion of that paper, discussion that I wrote last week and arXived this weekend. Besides the above comments on ToP, this discussion contains some of my usual arguments against the use of the Bayes factor as well as a short introduction to our recent proposal via mixtures. Short introduction as I had to restrain myself from reproducing the arguments in the original paper, for fear it would jeopardize its chances of getting published and, who knows?, discussed.
Archive for Theory of Probability
“One of Jeffreys’ goals was to create default Bayes factors by using prior distributions that obeyed a series of general desiderata.”
The paper Harold Jeffreys’s default Bayes factor hypothesis tests: explanation, extension, and application in Psychology by Alexander Ly, Josine Verhagen, and Eric-Jan Wagenmakers is both a survey and a reinterpretation cum explanation of Harold Jeffreys‘ views on testing. At about the same time, I received a copy from Alexander and a copy from the journal it had been submitted to! This work starts with a short historical entry on Jeffreys’ work and career, which includes four of his principles, quoted verbatim from the paper:
- “scientific progress depends primarily on induction”;
- “in order to formalize induction one requires a logic of partial belief” [enters the Bayesian paradigm];
- “scientific hypotheses can be assigned prior plausibility in accordance with their complexity” [a.k.a., Occam’s razor];
- “classical “Fisherian” p-values are inadequate for the purpose of hypothesis testing”.
“The choice of π(σ) therefore irrelevant for the Bayes factor as long as we use the same weighting function in both models”
A very relevant point made by the authors is that Jeffreys only considered embedded or nested hypotheses, a fact that allows for having common parameters between models and hence some form of reference prior. Even though (a) I dislike the notion of “common” parameters and (b) I do not think it is entirely legit (I was going to write proper!) from a mathematical viewpoint to use the same (improper) prior on both sides, as discussed in our Statistical Science paper. And in our most recent alternative proposal. The most delicate issue however is to derive a reference prior on the parameter of interest, which is fixed under the null and unknown under the alternative. Hence preventing the use of improper priors. Jeffreys tried to calibrate the corresponding prior by imposing asymptotic consistency under the alternative. And exact indeterminacy under “completely uninformative” data. Unfortunately, this is not a well-defined notion. In the normal example, the authors recall and follow the proposal of Jeffreys to use an improper prior π(σ)∝1/σ on the nuisance parameter and argue in his defence the quote above. I find this argument quite weak because suddenly the prior on σ becomes a weighting function... A notion foreign to the Bayesian cosmology. If we use an improper prior for π(σ), the marginal likelihood on the data is no longer a probability density and I do not buy the argument that one should use the same measure with the same constant both on σ alone [for the nested hypothesis] and on the σ part of (μ,σ) [for the nesting hypothesis]. We are considering two spaces with different dimensions and hence orthogonal measures. This quote thus sounds more like wishful thinking than like a justification. Similarly, the assumption of independence between δ=μ/σ and σ does not make sense for σ-finite measures. Note that the authors later point out that (a) the posterior on σ varies between models despite using the same data [which shows that the parameter σ is far from common to both models!] and (b) the [testing] Cauchy prior on δ is only useful for the testing part and should be replaced with another [estimation] prior when the model has been selected. Which may end up as a backfiring argument about this default choice.
“Each updated weighting function should be interpreted as a posterior in estimating σ within their own context, the model.”
The re-derivation of Jeffreys’ conclusion that a Cauchy prior should be used on δ=μ/σ makes it clear that this choice only proceeds from an imperative of fat tails in the prior, without solving the calibration of the Cauchy scale. (Given the now-available modern computing tools, it would be nice to see the impact of this scale γ on the numerical value of the Bayes factor.) And maybe it also proceeds from a “hidden agenda” to achieve a Bayes factor that solely depends on the t statistic. Although this does not sound like a compelling reason to me, since the t statistic is not sufficient in this setting.
In a differently interesting way, the authors mention the Savage-Dickey ratio (p.16) as a way to represent the Bayes factor for nested models, without necessarily perceiving the mathematical difficulty with this ratio that we pointed out a few years ago. For instance, in the psychology example processed in the paper, the test is between δ=0 and δ≥0; however, if I set π(δ=0)=0 under the alternative prior, which should not matter [from a measure-theoretic perspective where the density is uniquely defined almost everywhere], the Savage-Dickey representation of the Bayes factor returns zero, instead of 9.18!
“In general, the fact that different priors result in different Bayes factors should not come as a surprise.”
The second example detailed in the paper is the test for a zero Gaussian correlation. This is a sort of “ideal case” in that the parameter of interest is between -1 and 1, hence makes the choice of a uniform U(-1,1) easy or easier to argue. Furthermore, the setting is also “ideal” in that the Bayes factor simplifies down into a marginal over the sample correlation only, under the usual Jeffreys priors on means and variances. So we have a second case where the frequentist statistic behind the frequentist test[ing procedure] is also the single (and insufficient) part of the data used in the Bayesian test[ing procedure]. Once again, we are in a setting where Bayesian and frequentist answers are in one-to-one correspondence (at least for a fixed sample size). And where the Bayes factor allows for a closed form through hypergeometric functions. Even in the one-sided case. (This is a result obtained by the authors, not by Jeffreys who, as the proper physicist he was, obtained approximations that are remarkably accurate!)
“The fact that the Bayes factor is independent of the intention with which the data have been collected is of considerable practical importance.”
The authors have a side argument in this section in favour of the Bayes factor against the p-value, namely that the “Bayes factor does not depend on the sampling plan” (p.29), but I find this fairly weak (or tongue in cheek) as the Bayes factor does depend on the sampling distribution imposed on top of the data. It appears that the argument is mostly used to defend sequential testing.
“The Bayes factor (…) balances the tension between parsimony and goodness of fit, (…) against overfitting the data.”
In fine, I liked very much this re-reading of Jeffreys’ approach to testing, maybe the more because I now think we should get away from it! I am not certain it will help in convincing psychologists to adopt Bayes factors for assessing their experiments as it may instead frighten them away. And it does not bring an answer to the vexing issue of the relevance of point null hypotheses. But it constitutes a lucid and innovative of the major advance represented by Jeffreys’ formalisation of Bayesian testing.
Albert Jacquard passed away last week. He was a humanist, engaged in the defence of outcasts (laissés pour compte) like homeless and illegal immigrants. He had a regular chronicle of two minutes on France Culture that I used to listen to (when driving at that time of the day). In the obituaries published in the recent days, this side of the character was put forward, while very little was said about his scientific legacy. He was a statistician, first at INSEE, then at INED. After getting a PhD in genetics from Stanford in 1968, he got back to INED as a population geneticist, writing in 1978 his most famous book, Éloge de la Différence, against racial theories, which is the first in a long series of vulgarisation and philosophical books. Among his scientific books, he wrote the entry on Probabilités in the popular vulgarisation series “Que Sais-Je?”, with more than 40,000 copies sold and used by generations of students. (Among its 125 pages, the imposed length for a “Que Sais-Je?”, the book includes Bayes theorem and, more importantly, the Bayesian approach to estimating unknown probabilities!)
Aris Spanos just published a paper entitled “Who should be afraid of the Jeffreys-Lindley paradox?” in the journal Philosophy of Science. This piece is a continuation of the debate about frequentist versus llikelihoodist versus Bayesian (should it be Bayesianist?! or Laplacist?!) testing approaches, exposed in Mayo and Spanos’ Error and Inference, and discussed in several posts of the ‘Og. I started reading the paper in conjunction with a paper I am currently writing for a special volume in honour of Dennis Lindley, paper that I will discuss later on the ‘Og…
“…the postdata severity evaluation (…) addresses the key problem with Fisherian p-values in the sense that the severity evaluation provides the “magnitude” of the warranted discrepancy from the null by taking into account the generic capacity of the test (that includes n) in question as it relates to the observed data”(p.88)
First, the antagonistic style of the paper is reminding me of Spanos’ previous works in that it relies on repeated value judgements (such as “Bayesian charge”, “blatant misinterpretation”, “Bayesian allegations that have undermined the credibility of frequentist statistics”, “both approaches are far from immune to fallacious interpretations”, “only crude rules of thumbs”, &tc.) and rhetorical sleights of hand. (See, e.g., “In contrast, the severity account ensures learning from data by employing trustworthy evidence (…), the reliability of evidence being calibrated in terms of the relevant error probabilities” [my stress].) Connectedly, Spanos often resorts to an unusual [at least for statisticians] vocabulary that amounts to newspeak. Here are some illustrations: “summoning the generic capacity of the test”, ‘substantively significant”, “custom tailoring the generic capacity of the test”, “the fallacy of acceptance”, “the relevance of the generic capacity of the particular test”, yes the term “generic capacity” is occurring there with a truly high frequency. Continue reading
“If you placed your finger at that point, the two halves of the string would still be able to vibrate in the sin 2x pattern, but not in the sin x one. This explains the Pythagorean discovery that a string half as long produced a note one octave higher.” (p.143)
The following chapters are all about Physics: the wave equation, Fourier’s transform and the heat equation, Navier-Stokes’ equation(s), Maxwell’s equation(s)—as in The universe in zero word—, the second law of thermodynamics, E=mc² (of course!), and Schrödinger’s equation. I won’t go so much into details for those chapters, even though they are remarkably written. For instance, the chapter on waves made me understand the notion of harmonics in a much more intuitive and lasting way than previous readings. (This chapter 8 also mentions the “English mathematician Harold Jeffreys“, while Jeffreys was primarily a geophysicist. And a Bayesian statistician with major impact on the field, his Theory of Probability arguably being the first modern Bayesian book. Interestingly, Jeffreys also was the first one to find approximations to the Schrödinger’s equation, however he is not mentioned in this later chapter.) Chapter 9 mentions the heat equation but is truly about Fourier’s transform which he uses as a tool and later became a universal technique. It also covers Lebesgue’s integration theory, wavelets, and JPEG compression. Chapter 10 on Navier-Stokes’ equation also mentions climate sciences, where it takes a (reasonable) stand. Chapter 11 on Maxwell’s equations is a short introduction to electromagnetism, with radio the obvious illustration. (Maybe not the best chapter in the book.) Continue reading
Our paper with Andrew Gelman, “Not only defended but also applied”: the perceived absurdity of Bayesian inference, has been reviewed for the second time and is to appear in The American Statistician, as a discussion paper. Terrific news! This is my first discussion paper in The American Statistician (and the second in total, the first one being the re-read of Jeffreys‘ Theory of Probability.) [The updated version is now on arXiv.]
“…the argument is false that because some ideal form of this approach to reasoning seems excellent n theory it therefore follows that in practice using this and only this approach to reasoning is the right thing to do.” Stephen Senn, 2011
Deborah Mayo, Aris Spanos, and Kent Staley have edited a special issue of Rationality, Markets and Morals (RMM) (a rather weird combination, esp. for a journal name!) on “Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?” for which comments are open. Stephen Senn has a paper therein entitled You May Believe You Are a Bayesian But You Are Probably Wrong in his usual witty, entertaining, and… Bayesian-bashing style! I find it very kind of him to allow us to remain in the wrong, very kind indeed…
Now, the paper somehow intersects with the comments Stephen made on our review of Harold Jeffreys’ Theory of Probability a while ago. It contains a nice introduction to the four great systems of statistical inference, embodied by de Finetti, Fisher, Jeffreys, and Neyman plus Pearson. The main criticism of Bayesianism à la de Finetti is that it is so perfect as to be outworldish. And, since this perfection is lost in the practical implementation, there is no compelling reason to be a Bayesian. Worse, that all practical Bayesian implementations conflict with Bayesian principles. Hence a Bayesian author “in practice is wrong”. Stephen concludes with a call for eclecticism, quite in line with his usual style since this is likely to antagonise everyone. (I wonder whether or not having no final dot to the paper has a philosophical meaning. Since I have been caught in over-interpreting book covers, I will not say more!) As I will try to explain below, I believe Stephen has paradoxically himself fallen victim of over-theorising/philosophising! (Referring the interested reader to the above post as well as to my comments on Don Fraser’s “Is Bayes posterior quick and dirty confidence?” for more related points. Esp. about Senn’s criticisms of objective Bayes on page 52 that are not so central to this discussion… Same thing for the different notions of probability [p.49] and the relative difficulties of the terms in (2) [p.50]. Deborah Mayo has a ‘deconstructed” version of Stephen’s paper on her blog, with a much deeper if deBayesian philosophical discussion. And then Andrew Jaffe wrote a post in reply to Stephen’s paper. Whose points I cannot discuss for lack of time, but with an interesting mention of Jaynes as missing in Senn’s pantheon.)
“The Bayesian theory is a theory on how to remain perfect but it does not explain how to become good.” Stephen Senn, 2011
While associating theories with characters is a reasonable rethoretical device, especially with large scale characters as the one above!, I think it deters the reader from a philosophical questioning on the theory behind the (big) man. (In fact, it is a form of bullying or, more politely (?), of having big names shoved down your throat as a form of argument.) In particular, Stephen freezes the (Bayesian reasoning about the) Bayesian paradigm in its de Finetti phase-state, arguing about what de Finetti thought and believed. While this is historically interesting, I do not see why we should care at the praxis level. (I have made similar comments on this blog about the unpleasant aspects of being associated with one character, esp. the mysterious Reverent Bayes!) But this is not my main point.
“…in practice things are not so simple.” Stephen Senn, 2011
The core argument in Senn’s diatribe is that reality is always more complex than the theory allows for and thus that a Bayesian has to compromise on her/his perfect theory with reality/practice in order to reach decisions. A kind of philosophical equivalent to Achille and the tortoise. However, it seems to me that the very fact that the Bayesian paradigm is a learning principle implies that imprecisions and imperfections are naturally endowed into the decision process. Thus avoiding the apparent infinite regress (Regress ins Unendliche) of having to run a Bayesian analysis to derive the prior for the Bayesian analysis at the level below (which is how I interpret Stephen’s first paragraph in Section 3). By refusing the transformation of a perfect albeit ideal Bayesian into a practical if imperfect bayesian (or coherent learner or whatever name that does not sound like being a member of a sect!), Stephen falls short of incorporating the contrainte de réalité into his own paradigm. The further criticisms found about prior justification, construction, evaluation (pp.59-60) are also of that kind, namely preventing the statistician to incorporate a degree of (probabilistic) uncertainty into her/his analysis.
In conclusion, reading Stephen’s piece was a pleasant and thought-provoking moment. I am glad to be allowed to believe I am a Bayesian, even though I do not believe it is a belief! The praxis of thousands of scientists using Bayesian tools with their personal degree of subjective involvement is an evolutive organism that reaches much further than the highly stylised construct of de Finetti (or of de Finetti restaged by Stephen!). And appropriately getting away from claims to being perfect or right. Or even being more philosophical.