**A**n interesting question from X validated about constructing pseudo-priors for Bayesian model selection. Namely, how useful are these for the concept rather than the implementation? The only case where I am aware of pseudo-priors being used is in Bayesian MCMC algorithms such as Carlin and Chib (1995), where the distributions are used to complement the posterior distribution conditional on a single model (index) into a joint distribution across all model parameters. The trick of this construction is that the pseudo-priors can be essentially anything, including depending on the data as well. And while the impact the ability of the resulting Markov chain to move between spaces, they have no say on the resulting inference, either when choosing a model or when estimating the parameters of a chosen model. The concept of pseudo-priors was also central to the mis-interpretations found in Congdon (2006) and Scott (2002). Which we reanalysed with Jean-Michel Marin in Bayesian Analysis (2008) as the distinction between model-based posteriors and joint pseudo-posteriors.

## Archive for posterior probability

## are pseudopriors required in Bayesian model selection?

Posted in Books, Kids, pictures, Statistics, University life with tags Bayesian Analysis, canons, cross validated, Invalides, Joe Abercrombie, joint pseudo-posterior, model choice, Paris, posterior probability, pseudo-priors, The Last Argument of Kings on February 29, 2020 by xi'an## Bertrand-Borel debate

Posted in Books, Statistics with tags Bayes factor, Bayesian hypothesis testing, Bayesian model selection, Bertrand's paradox, conditioning, Deborah Mayo, Emile Borel, Erich Lehmann, Joseph Bertrand, Le Hasard, Pierre Simon Laplace, Pleiades, posterior probability, uniformly most powerful tests on May 6, 2019 by xi'an**O**n her blog, Deborah Mayo briefly mentioned the Bertrand-Borel debate on the (in)feasibility of hypothesis testing, as reported [and translated] by Erich Lehmann. A first interesting feature is that both [starting with] B mathematicians discuss the probability of causes in the Bayesian spirit of Laplace. With Bertrand considering that the prior probabilities of the different causes are impossible to set and then moving all the way to dismiss the use of probability theory in this setting, nipping the p-values in the bud..! And Borel being rather vague about the solution probability theory has to provide. As stressed by Lehmann.

“The Pleiades appear closer to each other than one would naturally expect. This statement deserves thinking about; but when one wants to translate the phenomenon into numbers, the necessary ingredients are lacking. In order to make the vague idea of closeness more precise, should we look for the smallest circle that contains the group? the largest of the angular distances? the sum of squares of all the distances? the area of the spherical polygon of which some of the stars are the vertices and which contains the others in its interior? Each of these quantities is smaller for the group of the Pleiades than seems plausible. Which of them should provide the measure of implausibility? If three of the stars form an equilateral triangle, do we have to add this circumstance, which is certainly very unlikely apriori, to those that point to a cause?” Joseph Bertrand (p.166)

“But whatever objection one can raise from a logical point of view cannot prevent the preceding question from arising in many situations: the theory of probability cannot refuse to examine it and to give an answer; the precision of the response will naturally be limited by the lack of precision in the question; but to refuse to answer under the pretext that the answer cannot be absolutely precise, is to place oneself on purely abstract grounds and tomisunderstand the essential nature of the application of mathematics.” Emile Borel (Chapter 4)

Another highly interesting objection of Bertrand is somewhat linked with his conditioning paradox, namely that the density of the observed unlikely event depends on the choice of the statistic that is used to calibrate the unlikeliness, which makes complete sense in that the information contained in each of these statistics and the resulting probability or likelihood differ to an arbitrary extend, that there are few cases (monotone likelihood ratio) where the choice can be made, and that Bayes factors share the same drawback if they do not condition upon the entire sample. In which case there is no selection of “circonstances remarquables”. Or of uniformly most powerful tests.

## a question from McGill about The Bayesian Choice

Posted in Books, pictures, Running, Statistics, Travel, University life with tags Bayes factors, Bayesian hypothesis testing, Canada, cross validated, improper prior, McGill University, Montréal, posterior probability on December 26, 2018 by xi'an**I** received an email from a group of McGill students working on Bayesian statistics and using The Bayesian Choice (although the exercise pictured below is not in the book, the closest being exercise 1.53 inspired from Raiffa and Shlaiffer, 1961, and exercise 5.10 as mentioned in the email):

There was a question that some of us cannot seem to decide what is the correct answer. Here are the issues,

Some people believe that the answer to both is ½, while others believe it is 1. The reasoning for ½ is that since Beta is a continuous distribution, we never could have θ exactly equal to ½. Thus regardless of α, the probability that θ=½ in that case is 0. Hence it is ½. I found a related stack exchange question that seems to indicate this as well.

The other side is that by Markov property and mean of Beta(a,a), as α goes to infinity , we will approach ½ with probability 1. And hence the limit as α goes to infinity for both (a) and (b) is 1. I think this also could make sense in another context, as if you use the Bayes factor representation. This is similar I believe to the questions in the Bayesian Choice, 5.10, and 5.11.

As it happens, the answer is ½ in the first case (a) because π(H⁰) is ½ regardless of α and 1 in the second case (b) because the evidence against H⁰ goes to zero as α goes to zero *(watch out!)*, along with the mass of the prior on any compact of (0,1) since Γ(2α)/Γ(α)². (The limit does not correspond to a proper prior and hence is somewhat meaningless.) However, when α goes to infinity, the evidence against H⁰ goes to infinity and the posterior probability of ½ goes to zero, despite the prior under the alternative being more and more concentrated around ½!