Archive for Bayesian foundations

the first Bayesian

Posted in Statistics with tags , , , , , , , on February 20, 2018 by xi'an

In the first issue of Statistical Science for this year (2018), Stephen Stiegler pursues the origins of Bayesianism as attributable to Richard Price, main author of Bayes’ Essay. (This incidentally relates to an earlier ‘Og piece on that notion!) Steve points out the considerable inputs of Price on this Essay, even though the mathematical advance is very likely to be entirely Bayes’. It may however well be Price who initiated Bayes’ reflections on the matter, towards producing a counter-argument to Hume’s “On Miracles”.

“Price’s caution in addressing the probabilities of hypotheses suggested by data is rare in early literature.”

A section of the paper is about Price’s approach data-determined hypotheses and to the fact that considering such hypotheses cannot easily fit within a Bayesian framework. As stated by Price, “it would be improbable as infinite to one”. Which is a nice way to address the infinite mass prior.

 

Practicals of Uncertainty [book review]

Posted in Books, Statistics, University life with tags , , , , , , , on December 22, 2017 by xi'an

On my way to the O’Bayes 2017 conference in Austin, I [paradoxically!] went through Jay Kadane’s Pragmatics of Uncertainty, which had been published earlier this year by CRC Press. The book is to be seen as a practical illustration of the Principles of Uncertainty Jay wrote in 2011 (and I reviewed for CHANCE). The avowed purpose is to allow the reader to check through Jay’s applied work whether or not he had “made good” on setting out clearly the motivations for his subjective Bayesian modelling. (While I presume the use of the same P of U in both books is mostly a coincidence, I started wondering how a third P of U volume could be called. Perils of Uncertainty? Peddlers of Uncertainty? The game is afoot!)

The structure of the book is a collection of fifteen case studies undertaken by Jay over the past 30 years, covering paleontology, survey sampling, legal expertises, physics, climate, and even medieval Norwegian history. Each chapter starts with a short introduction that often explains how he came by the problem (most often as an interesting PhD student consulting project at CMU), what were the difficulties in the analysis, and what became of his co-authors. As noted by the author, the main bulk of each chapter is the reprint (in a unified style) of the paper and most of these papers are actually and freely available on-line. The chapter always concludes with an epilogue (or post-mortem) that re-considers (very briefly) what had been done and what could have been done and whether or not the Bayesian perspective was useful for the problem (unsurprisingly so for the majority of the chapters!). There are also reading suggestions in the other P of U and a few exercises.

“The purpose of the book is philosophical, to address, with specific examples, the question of whether Bayesian statistics is ready for prime time. Can it be used in a variety of applied settings to address real applied problems?”

The book thus comes as a logical complement of the Principles, to demonstrate how Jay himself did apply his Bayesian principles to specific cases and how one can set the construction of a prior, of a loss function or of a statistical model in identifiable parts that can then be criticised or reanalysed. I find browsing through this series of fourteen different problems fascinating and exhilarating, while I admire the dedication of Jay to every case he presents in the book. I also feel that this comes as a perfect complement to the earlier P of U, in that it makes refering to a complete application of a given principle most straightforward, the problem being entirely described, analysed, and in most cases solved within a given chapter. A few chapters have discussions, being published in the Valencia meeting proceedings or another journal with discussions.

While all papers have been reset in the book style, I wish the graphs had been edited as well as they do not always look pretty. Although this would have implied a massive effort, it would have also been great had each chapter and problem been re-analysed or at least discussed by another fellow (?!) Bayesian in order to illustrate the impact of individual modelling sensibilities. This may however be a future project for a graduate class. Assuming all datasets are available, which is unclear from the text.

“We think however that Bayes factors are overemphasized. In the very special case in which there are only two possible “states of the world”, Bayes factors are sufficient. However in the typical case in which there are many possible states of the world, Bayes factors are sufficient only when the decision-maker’s loss has only two values.” (p. 278)

The above is in Jay’s reply to a comment from John Skilling regretting the absence of marginal likelihoods in the chapter. Reply to which I completely subscribe.

[Usual warning: this review should find its way into CHANCE book reviews at some point, with a fairly similar content.]

Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on May 9, 2017 by xi'an

Guillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as nto a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that all models are wrong: (resonating from last week discussion):

  1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
  2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
  3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
  4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
  5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
  6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

Bayes is typically wrong…

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on May 3, 2017 by xi'an

In Harvard, this morning, Don Fraser gave a talk at the Bayesian, Fiducial, and Frequentist conference where he repeated [as shown by the above quote] the rather harsh criticisms on Bayesian inference he published last year in Statistical Science. And which I discussed a few days ago. The “wrongness” of Bayes starts with the completely arbitrary choice of the prior, which Don sees as unacceptable, and then increases because the credible regions are not confident regions, outside natural parameters from exponential families (Welch and Peers, 1963). And one-dimensional parameters using the profile likelihood (although I cannot find a proper definition of what the profile likelihood is in the paper, apparently a plug-in version that is not a genuine likelihood, hence somewhat falling under the same this-is-not-a-true-probability cleaver as the disputed Bayesian approach).

“I expect we’re all missing something, but I do not know what it is.” D.R. Cox, Statistical Science, 1994

And then Nancy Reid delivered a plenary lecture “Are we converging?” on the afternoon that compared most principles (including objective if not subjective Bayes) against different criteria, like consistency, nuisance elimination, calibration, meaning of probability, and so on.  In an highly analytic if pessimistic panorama. (The talk should be available on line at some point soon.)

on Dutch book arguments

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , on May 1, 2017 by xi'an

“Reality is not always probable, or likely.”― Jorge Luis Borges

As I am supposed to discuss Teddy Seidenfeld‘s talk at the Bayes, Fiducial and Frequentist conference in Harvard today [the snow happened last time!], I started last week [while driving to Wales] reading some related papers of his. Which is great as I had never managed to get through the Dutch book arguments, including those in Jim’s book.

The paper by Mark Schervish, Teddy Seidenfeld, and Jay Kadane is defining coherence as the inability to bet against the predictive statements based on the procedure. A definition that sounds like a self-fulfilling prophecy to me as it involves a probability measure over the parameter space. Furthermore, the notion of turning inference, which aims at scientific validation, into a leisure, no-added-value, and somewhat ethically dodgy like gambling, does not agree with my notion of a validation for a theory. That is, not as a compelling reason for adopting a Bayesian approach. Not that I have suddenly switched to the other [darker] side, but I do not feel those arguments helping in any way, because of this dodgy image associated with gambling. (Pardon my French, but each time I read about escrows, I think of escrocs, or crooks, which reinforces this image! Actually, this name derives from the Old French escroue, but the modern meaning of écroué is sent to jail, which brings us back to the same feeling…)

Furthermore, it sounds like both a weak notion, since it implies an almost sure loss for the bookmaker, plus coherency holds for any prior distribution, including Dirac masses!, and a frequentist one, in that it looks at all possible values of the parameter (in a statistical framework). It also turns errors into monetary losses, taking them at face value. Which sounds also very formal to me.

But the most fundamental problem I have with this approach is that, from a Bayesian perspective, it does not bring any evaluation or ranking of priors, and in particular does not help in selecting or eliminating some. By behaving like a minimax principle, it does not condition on the data and hence does not evaluate the predictive properties of the model in terms of the data, e.g. by comparing pseudo-data with real data.

 While I see no reason to argue in favour of p-values or minimax decision rules, I am at a loss in understanding the examples in How to not gamble if you must. In the first case, i.e., when dismissing the α-level most powerful test in the simple vs. simple hypothesis testing case, the argument (in Example 4) starts from the classical (Neyman-Pearsonist) statistician favouring the 0.05-level test over others. Which sounds absurd, as this level corresponds to a given loss function, which cannot be compared with another loss function. Even though the authors chose to rephrase the dilemma in terms of a single 0-1 loss function and then turn the classical solution into the choice of an implicit variance-dependent prior. Plus force the poor Pearsonist to make a wager represented by the risk difference. The whole sequence of choices sounds both very convoluted and far away from the usual practice of a classical statistician… Similarly, when attacking [in Section 5.2] the minimax estimator in the Bernoulli case (for the corresponding proper prior depending on the sample size n), this minimax estimator is admissible under quadratic loss and still a Dutch book argument applies, which in my opinion definitely argues against the Dutch book reasoning. The way to produce such a domination result is to mix two Bernoulli estimation problems for two different sample sizes but the same parameter value, in which case there exist [other] choices of Beta priors and a convex combination of the risks functions that lead to this domination. But this example [Example 6] mostly exposes the artificial nature of the argument: when estimating the very same probability θ, what is the relevance of adding the risks or errors resulting from using two estimators for two different sample sizes. Of the very same probability θ. I insist on the very same because when instead estimating two [independent] values of θ, there cannot be a Stein effect for the Bernoulli probability estimation problem, that is, any aggregation of admissible estimators remains admissible. (And yes it definitely sounds like an exercise in frequentist decision theory!)

en route to Boston!

Posted in pictures, Running, Travel, University life with tags , , , , , , , on April 29, 2017 by xi'an

Bayes, reproducibility and the Quest for Truth

Posted in Books, Statistics, University life with tags , , , , , on April 27, 2017 by xi'an

Don Fraser, Mylène Bédard, and three coauthors have written a paper with the above dramatic title in Statistical Science about the reproducibility of Bayesian inference in the framework of what they call a mathematical prior. Connecting with the earlier quick-and-dirty tag attributed by Don to Bayesian credible intervals.

“We provide simple (…) counter-examples to general claims that Bayes can offer accuracy for statistical inference. To obtain this accuracy with Bayes, more effort is required compared to recent likelihood methods (…) [and] accuracy beyond first order is routinely not available (…) An alternative is to view default Bayes as an exploratory technique and then ask does it do as it overtly claims? Is it reproducible as understood in contemporary science? (…) No one has answers although speculative claims abound.” (p. 1)

The early stages of the paper questions the nature of a prior distribution in terms of objectivity and reproducibility, which strikes me as a return to older debates on the nature of probability. And of a dubious insistence on the reality of a prior when the said reality is customarily and implicitly assumed for the sampling distribution. While we “can certainly ask how [a posterior] quantile relates to the true value of the parameter”, I see no compelling reason why the associated quantile should be endowed with a frequentist coverage meaning, i.e., be more than a normative indication of the deviation from the true value. (Assuming there is such a parameter.) To consider that the credible interval of interest can be “objectively” assessed by simulation experiments evaluating its coverage is thus doomed from the start (since there is not reason for the nominal coverage) and situated on the wrong plane since it stems from the hypothetical frequentist model for a range of parameter values. Instead I find simulations from (generating) models useful in a general ABC sense, namely by producing realisations from the predictive one can assess at which degree of roughness the data is compatible with the formal construct. To bind reproducibility to the frequentist framework thus sounds wrong [to me] as being model-based. In other words, I do not find the definition of reproducibility used in the paper to be objective (literally bouncing back from Gelman and Hennig Read Paper)

At several points in the paper, the legal consequences of using a subjective prior are evoked as legally binding and implicitly as dangerous. With the example of the L’Aquila expert trial. I have trouble seeing the relevance of this entry as an adverse lawyer is as entitled to attack the expert on her or his sampling model. More fundamentally, I feel quite uneasy about bringing this type of argument into the debate!