Next May 1-3, I will attend the 4th Bayesian, Fiducial and Frequentist Conference at Harvard University (hopefully not under snow at that time of year), which is a meeting between philosophers and statisticians about foundational thinking in statistics and inference under uncertainty. This should be fun! (Registration is now open.)
Archive for Bayesian Analysis
While visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.
“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”
“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionally contradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “
This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).
“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”
“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”
It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.
“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”
The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.
In a [sort of] coincidence, shortly after writing my review on Le bayésianisme aujourd’hui, I got invited by the book editor, Isabelle Drouet, to take part in a round-table on Bayesianism in La Sorbonne. Which constituted the first seminar in the monthly series of the séminaire “Probabilités, Décision, Incertitude”. Invitation that I accepted and honoured by taking place in this public debate (if not dispute) on all [or most] things Bayes. Along with Paul Egré (CNRS, Institut Jean Nicod) and Pascal Pernot (CNRS, Laboratoire de chimie physique). And without a neuroscientist, who could not or would not attend.
While nothing earthshaking came out of the seminar, and certainly not from me!, it was interesting to hear of the perspectives of my philosophy+psychology and chemistry colleagues, the former explaining his path from classical to Bayesian testing—while mentioning trying to read the book Statistical rethinking I reviewed a few months ago—and the later the difficulty to teach both colleagues and students the need for an assessment of uncertainty in measurements. And alluding to GUM, developed by the Bureau International des Poids et Mesures I visited last year. I tried to present my relativity viewpoints on the [relative] nature of the prior, to avoid the usual morass of debates on the nature and subjectivity of the prior, tried to explain Bayesian posteriors via ABC, mentioned examples from The Theorem that Would not Die, yet untranslated into French, and expressed reserves about the glorious future of Bayesian statistics as we know it. This seminar was fairly enjoyable, with none of the stress induced by the constraints of a radio-show. Just too bad it did not attract a wider audience!
It is quite rare to see a book published in French about Bayesian statistics and even rarer to find one that connects philosophy of science, foundations of probability, statistics, and applications in neurosciences and artificial intelligence. Le bayésianisme aujourd’hui (Bayesianism today) was edited by Isabelle Drouet, a Reader in Philosophy at La Sorbonne. And includes a chapter of mine on the basics of Bayesian inference (à la Bayesian Choice), written in French like the rest of the book.
The title of the book is rather surprising (to me) as I had never heard the term Bayesianism mentioned before. As shown by this link, the term apparently exists. (Even though I dislike the sound of it!) The notion is one of a probabilistic structure of knowledge and learning, à la Poincaré. As described in the beginning of the book. But I fear the arguments minimising the subjectivity of the Bayesian approach should not be advanced, following my new stance on the relativity of probabilistic statements, if only because they are defensive and open the path all too easily to counterarguments. Similarly, the argument according to which the “Big Data” era makesp the impact of the prior negligible and paradoxically justifies the use of Bayesian methods is limited to the case of little Big Data, i.e., when the observations are more or less iid with a limited number of parameters. Not when the number of parameters explodes. Another set of arguments that I find both more modern and compelling [for being modern is not necessarily a plus!] is the ease with which the Bayesian framework allows for integrative and cooperative learning. Along with its ultimate modularity, since each component of the learning mechanism can be extracted and replaced with an alternative. Continue reading
Quite a coincidence! I just came across another bug in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Already discussed here and on X validated. While working with one participant to the post-ISBA softshop, we were looking for efficient approaches to simulating correlation matrices and came [by Google] across the above R code associated with a 3×3 correlation matrix, which misses the additional constraint that the determinant must be positive. As shown e.g. by the example
> eigen(matrix(c(1,-.8,.7,-.8,1,.6,.7,.6,1),ncol=3)) $values  1.8169834 1.5861960 -0.4031794
having all correlations between -1 and 1 is not enough. Just. Not. Enough.
“The world is full of obvious things which nobody by any chance ever observes.” The Hound of the Baskervilles
In connection with the incoming publication of James Watson’s and Chris Holmes’ Approximating models and robust decisions in Statistical Science, Judith Rousseau and I wrote a discussion on the paper that has been arXived yesterday.
“Overall, we consider that the calibration of the Kullback-Leibler divergence remains an open problem.” (p.18)
While the paper connects with earlier ones by Chris and coauthors, and possibly despite the overall critical tone of the comments!, I really appreciate the renewed interest in robustness advocated in this paper. I was going to write Bayesian robustness but to differ from the perspective adopted in the 90’s where robustness was mostly about the prior, I would say this is rather a Bayesian approach to model robustness from a decisional perspective. With definitive innovations like considering the impact of posterior uncertainty over the decision space, uncertainty being defined e.g. in terms of Kullback-Leibler neighbourhoods. Or with a Dirichlet process distribution on the posterior. This may step out of the standard Bayesian approach but it remains of definite interest! (And note that this discussion of ours [reluctantly!] refrained from capitalising on the names of the authors to build easy puns linked with the most Bayesian of all detectives!)