After a very, very long delay, we eventually re-revised our paper about necessary and sufficient conditions on summary statistics to be relevant for model choice (i.e. to lead to consistent tests). Reasons, both good and bad, abound for this delay! Some (rather bad) were driven by the completion of a certain new edition… Some (fairly good) are connected with the requests from the Series B editorial team, towards improving our methodological input. As a result we put more emphasis on the post-ABC cross-checking for the relevance of the summary choice, via a predictive posterior evaluation of the means of the summary statistic under both models and a test for mean equality. And re-ran a series of experiments on a three population population genetic example. Plus, on the side, simplified some of our assumptions. I dearly hope the paper can make it through but am also looking forward the opinion of the Series B editorial team The next version of Relevant statistics for Bayesian model choice should be arXived by now (meaning when this post appears!).
Archive for Pierre Simon de Laplace
This morning I gave my talk on ABC; computation or inference? at the appliBUGS seminar. Here, in Paris, BUGS stands for Bayesian United Group of Statisticians! Presumably in connection with a strong football culture, since the talk after mine was Jean-Louis Foulley’s ranking of the Euro 2012 teams. Quite an interesting talk (even though I am not particularly interested in football and even though I dozed a little, steaming out the downpour I had received on my bike-ride there…) I am also sorry I missed the next talk by Jean-Louis on Galton’s quincunx. (
Unfortunately, his slides are not [yet?] on-line.)
As a coincidence, after launching a BayesComp page on Google+ (as an aside, I am quite nonplussed by the purpose of Google-), Nicolas Chopin also just started a Bayes in Paris webpage, in connection with our informal seminar/reading group at CREST. With the appropriate picture this time, i.e. a street plaque remembering…Laplace! May I suggest the RER stop Laplace and his statue in the Paris observatory as additional illustrations for the other pages…
No, no, this is not an announcement for a meeting on an Australian beach (which is Bayes on the Beach, taking place next November (6-8) on the Sunshine Coast and is organised by Kerrie Mengersen’s BRAG, at QUT, that I just left! With Robert Wolpert as the international keynote speaker and Matt Wand as the Australian keynote speaker.) Bayes by the Bay is “a pedagogical workshop on Bayesian methods in Science” organised by the Institute of Mathematical Sciences, based in the CIT campus in Chennai. It is taking place on January 4-8, 2013, in Pondichéry. (To use the French spelling of this former comptoir of French India…) Just prior to the ISBA Varanasi meeting on Bayesian Statistics.
Great: the webpage for the workshop uses the attached picture of Pierre-Simon (de) Laplace, rather than the unlikely picture of Thomas Bayes found all over the place (incl. this blog!). This was also the case in Christensen et al.’s Bayesian ideas and data analysis. So maybe there is a trend there. I also like the name “Bayes by the Bay“, as it reminds me of a kid song we used to sing to/with our kids when they were young, “down by the bay“, after a summer vacation with Anne and George Casella…
The debate about non-informative priors has been going on for ages, at least since the end of the 19th century with criticisms by Bertrand and de Morgan about the lack of invariance of Laplace’s uniform priors (the same criticism reported by Stéphane Laurent in the above comments). This lack of invariance sounded like a death stroke for the Bayesian approach and, while some Bayesians were desperately trying to cling to specific distributions, using less-than-formal arguments, others had a wider vision of a larger picture where priors could be used in situations where there was hardly any prior information, beyond the shape of the likelihood itself. (This was even before Abraham Wald established his admissibility and complete class results about Bayes procedures. And at about the same time as E.J.G. Pitman gave an “objective” derivation of the best invariant estimator as a Bayes estimator against the corresponding Haar measure…)
This vision is best represented by Jeffreys’ distributions, where the information matrix of the sampling model, , is turned into a prior distribution
which is most often improper, i.e. does not integrate to a finite value. The label “non-informative” associated with Jeffreys’ priors is rather unfortunate, as they represent an input from the statistician, hence are informative about something! Similarly, “objective” has an authoritative weight I dislike… I thus prefer the label “reference prior”, used for instance by José Bernado.
Those priors indeed give a reference against which one can compute either the reference estimator/test/prediction or one’s own estimator/test/prediction using a different prior motivated by subjective and objective items of information. To answer directly the question, “why not use only informative priors?”, there is actually no answer. A prior distribution is a choice made by the statistician, neither a state of Nature nor a hidden variable. In other words, there is no “best prior” that one “should use”. Because this is the nature of statistical inference that there is no “best answer”.
Hence my defence of the noninformative/reference choice! It is providing the same range of inferential tools as other priors, but gives answers that are only inspired by the shape of the likelihood function, rather than induced by some opinion about the range of the unknown parameters.
The Brazilian society for Bayesian Analysis (ISBrA, whose annual meeting is taking place at this very time!) asked me to write a review on Pierre Simon Laplace’s book, Théorie Analytique des Probabilités, a book that was initially published in 1812, exactly two centuries ago. I promptly accepted this request as (a) I had never looked at this book and so this provided me with a perfect opportunity to do so, (b) while in Vancouver, Julien Cornebise had bought for me a 1967 reproduction of the 1812 edition, (c) I was curious to see how much of the book had permeated modern probability and statistics or, conversely, how much of Laplace’s perspective was still understandable by modern day standards. (Note that the link on the book leads to a free version of the 1814, not 1812, edition of the book, as free as the kindle version on amazon.)
“Je m’attache surtout, à déterminer la probabilité des causes et des résultats indiqués par événemens considérés en grand nombre.” P.S. Laplace, Théorie Analytique des Probabilités, page 3
First, I must acknowledge I found the book rather difficult to read and this for several reasons: (a) as is the case for books from older times, the ratio text-to-formulae is very high, with an inconvenient typography and page layout (ar least for actual standards), so speed-reading is impossible; (b) the themes offered in succession are often abruptly brought and uncorrelated with the previous ones; (c) the mathematical notations are 18th-century, so sums are indicated by S, exponentials by c, and so on, which again slows down reading and understanding; (d) for all of the above reasons, I often missed the big picture and got mired into technical details until they made sense or I gave up; (e) I never quite understood whether or not Laplace was interested in the analytics like generating functions only to provide precise numerical approximations or for their own sake. Hence a form of disappointment by the end of the book, most likely due to my insufficient investment in the project (on which I mostly spent an Amsterdam/Calgary flight and jet-lagged nights at BIRS…), even though I got excited by finding the bits and pieces about Bayesian estimation and testing. Continue reading
Here is [yet!] another Bayesian textbook that appeared recently. I read it in the past few days and, despite my obvious biases and prejudices, I liked it very much! It has a lot in common (at least in spirit) with our Bayesian Core, which may explain why I feel so benevolent towards Bayesian ideas and data analysis. Just like ours, the book by Ron Christensen, Wes Johnson, Adam Branscum, and Timothy Hanson is indeed focused on explaining the Bayesian ideas through (real) examples and it covers a lot of regression models, all the way to non-parametrics. It contains a good proportion of WinBugs and R codes. It intermingles methodology and computational chapters in the first part, before moving to the serious business of analysing more and more complex regression models. Exercises appear throughout the text rather than at the end of the chapters. As the volume of their book is more important (over 500 pages), the authors spend more time on analysing various datasets for each chapter and, more importantly, provide a rather unique entry on prior assessment and construction. Especially in the regression chapters. The author index is rather original in that it links the authors with more than one entry to the topics they are connected with (Ron Christensen winning the game with the highest number of entries). Continue reading
A few days ago, I had lunch with Sharon McGrayne in a Parisian café and we had a wonderful chat about the people she had met during the preparation of her book, the theory that would not die. Among others, she mentioned the considerable support provided by Dennis Lindley, Persi Diaconis, and Bernard Bru. She also told me about a few unsavoury characters who simply refused to talk to her about the struggles and rise of Bayesian statistics. Then, once I had biked home, her book had at last arrived in my mailbox! How timely! (Actually, getting the book before would have been better, as I would have been able to ask more specific questions. But it seems the publisher, Yale University Press, had not forecasted the phenomenal success of the book and thus failed to scale the reprints accordingly!)
Here is thus my enthusiastic (and obviously biased) reaction to the theory that would not die. It tells the story and the stories of Bayesian statistics and of Bayesians in a most genial and entertaining manner. There may be some who will object to such a personification of science, which should be (much) more than the sum of the characters who contributed to it. However, I will defend the perspective that (Bayesian) statistical science is as much philosophy as it is mathematics and computer-science, thus that the components that led to its current state were contributed by individuals, for whom the path to those components mattered. While the book inevitably starts with the (patchy) story of Thomas Bayes’s life, incl. his passage in Edinburgh, and a nice non-mathematical description of his ball experiment, the next chapter is about “the man who did everything”, …, yes indeed, Pierre-Simon (de) Laplace himself! (An additional nice touch is the use of lower case everywhere, instead of an inflation of upper case letters!) How Laplace attacked the issue of astronomical errors is brilliantly depicted, rooting the man within statistics and explaining why he would soon move to the “probability of causes”. And rediscover plus generalise Bayes’ theorem. That his (rather unpleasant!) thirst for honours and official positions would cause later disrepute on his scientific worth is difficult to fathom, esp. when coming from knowledgeable statisticians like Florence Nightingale David. The next chapter is about the dark ages of [not yet] Bayesian statistics and I particularly liked the links with the French army, discovering there that the great Henri Poincaré testified at Dreyfus’ trial using a Bayesian argument, that Bertillon had completely missed the probabilistic point, and that the military judges were then all aware of Bayes’ theorem, thanks to Bertrand’s probability book being used at École Polytechnique! (The last point actually was less of a surprise, given that I had collected some documents about the involvement of late 19th/early 20th century artillery officers in the development of Bayesian techniques, Edmond Lhostes and Maurice Dumas, in connection with Lyle Broemeling’s Biometrika study.) The description of the fights between Fisher and Bayesians and non-Bayesians alike is as always both entertaining and sad. Sad also is the fact that Jeffreys’ masterpiece got so little recognition at the time. (While I knew about Fisher’s unreasonable stand on smoking, going as far as defending the assumption that “lung cancer might cause smoking”(!), the Bayesian analysis of Jerome Cornfield was unknown to me. And quite fascinating.) The figure of Fisher actually permeates the whole book, as a negative bullying figure preventing further developments of early Bayesian statistics, but also as an ambivalent anti-Bayesian who eventually tried to create his own brand of Bayesian statistics in the format of fiducial statistics…
“…and then there was the ghastly de Gaulle.” D. Lindley
The following part of the theory that would not die is about Bayes’ contributions to the war (WWII), at least from the Allied side. Again, I knew most of the facts about Alan Turing and Bletchley Park’s Enigma, however the story is well-told and, as in previous occasions, I cannot but be moved by the waste of such a superb intellect, thanks to the stupidity of governments. The role of Albert Madansky in the assessment of the [lack of] safety of nuclear weapons is also well-described, stressing the inevitability of a Bayesian assessment of a one-time event that had [thankfully] not yet happened. The above quote from Dennis Lindley is the conclusion of his argument on why Bayesian statistics were not called Laplacean; I would think instead that the French post-war attraction for abstract statistics in the wake of Bourbaki did more against this recognition than de Gaulle’s isolationism and ghastliness. The involvement of John Tukey into military research was also a novelty for me, but not so much as his use of Bayesian [small area] methods for NBC election night previsions. (They could not hire José nor Andrew at the time.) The conclusion of Chapter 14 on why Tukey felt the need to distance himself from Bayesianism is quite compelling. Maybe paradoxically, I ended up appreciating Chapter 15 even more for the part about the search for a missing H-bomb near Palomares, Spain, as it exposes the plusses a Bayesian analysis would have brought.
“There are many classes of problems where Bayesian analyses are reasonable, mainly classes with which I have little acquaintance.” J. Tukey
When approaching near recent times and to contemporaries, Sharon McGrayne gives a very detailed coverage of the coming-of-age of Bayesians like Jimmy Savage and Dennis Lindley, as well as the impact of Stein’s paradox (a personal epiphany!), along with the important impact of Howard Raiffa and Robert Schlaifer, both on business schools and on modelling prior beliefs [via conjugate priors]. I did not know anything about their scientific careers, but Applied Statistical Decision Theory is a beautiful book that prefigured both DeGroot‘s and Berger‘s. (As an aside, I was amused by Raiffa using Bayesian techniques for horse betting based on race bettors, as I had vaguely played with the idea during my spare if compulsory time in the French Navy!) Similarly, while I’d read detailed scientific accounts of Frederick Mosteller’s and David Wallace’s superb Federalist Papers study, they were only names to me. Chapter 12 mostly remedied this lack of mine’s.
“We are just starting” P. Diaconis
The final part, entitled Eureka!, is about the computer revolution we witnessed in the 1980’s, culminating with the (re)discovery of MCMC methods we covered in our own “history”. Because it contains stories that are closer and closer to today’s time, it inevitably crumbles into shorter and shorter accounts. However, the theory that would not die conveys the essential message that Bayes’ rule had become operational, with its own computer language and objects like graphical models and Bayesian networks that could tackle huge amounts of data and real-time constraints. And used by companies like Microsoft and Google. The final pages mention neurological experiments on how the brain operates in a Bayesian-like way (a direction much followed by neurosciences, as illustrated by Peggy Series’ talk at Bayes-250).
In conclusion, I highly enjoyed reading through the theory that would not die. And I am sure most of my Bayesian colleagues will as well. Being Bayesians, they will compare the contents with their subjective priors about Bayesian history, but will in the end update those profitably. (The most obvious missing part is in my opinion the absence of E.T Jaynes and the MaxEnt community, which would deserve a chapter on its own.) Maybe ISBA could consider supporting a paperback or electronic copy to distribute to all its members! As an insider, I have little idea on how the book would be perceived by the layman: it does not contain any formula apart from [the discrete] Bayes’ rule at some point, so everyone can read through. The current success of the theory that would not die shows that it reaches much further than academic circles. It may be that the general public does not necessarily grasp the ultimate difference between frequentist and Bayesians, or between Fisherians and Neyman-Pearsonians. However the theory that would not die goes over all the elements that explain these differences. In particular, the parts about single events are quite illuminating on the specificities of the Bayesian approach. I will certainly [more than] recommend it to all of my graduate students (and buy the French version for my mother once it is translated, so that she finally understands why I once gave a talk “Don’t tell my mom I am Bayesian” at ENSAE…!) If there is any doubt from the above, I obviously recommend the book to all Og’s readers!