## a question from McGill about The Bayesian Choice

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , on December 26, 2018 by xi'an I received an email from a group of McGill students working on Bayesian statistics and using The Bayesian Choice (although the exercise pictured below is not in the book, the closest being exercise 1.53 inspired from Raiffa and Shlaiffer, 1961, and exercise 5.10 as mentioned in the email):

There was a question that some of us cannot seem to decide what is the correct answer. Here are the issues,

Some people believe that the answer to both is ½, while others believe it is 1. The reasoning for ½ is that since Beta is a continuous distribution, we never could have θ exactly equal to ½. Thus regardless of α, the probability that θ=½ in that case is 0. Hence it is ½. I found a related stack exchange question that seems to indicate this as well.

The other side is that by Markov property and mean of Beta(a,a), as α goes to infinity , we will approach ½ with probability 1. And hence the limit as α goes to infinity for both (a) and (b) is 1. I think this also could make sense in another context, as if you use the Bayes factor representation. This is similar I believe to the questions in the Bayesian Choice, 5.10, and 5.11. As it happens, the answer is ½ in the first case (a) because π(H⁰) is ½ regardless of α and 1 in the second case (b) because the evidence against H⁰ goes to zero as α goes to zero (watch out!), along with the mass of the prior on any compact of (0,1) since Γ(2α)/Γ(α)². (The limit does not correspond to a proper prior and hence is somewhat meaningless.) However, when α goes to infinity, the evidence against H⁰ goes to infinity and the posterior probability of ½ goes to zero, despite the prior under the alternative being more and more concentrated around ½! ## back to the Bayesian Choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on October 17, 2018 by xi'an Surprisingly (or not?!), I received two requests about some exercises from The Bayesian Choice, one from a group of students from McGill having difficulties solving the above, wondering about the properness of the posterior (but missing the integration of x), to whom I sent back this correction. And another one from the Czech Republic about a difficulty with the term “evaluation” by which I meant (pardon my French!) estimation.

## A new approach to Bayesian hypothesis testing

Posted in Books, Statistics with tags , , , , , on September 8, 2016 by xi'an

“The main purpose of this paper is to develop a new Bayesian hypothesis testing approach for the point null hypothesis testing (…) based on the Bayesian deviance and constructed in a decision theoretical framework. It can be regarded as the Bayesian version of the likelihood ratio test.” This paper got published in Journal of Econometrics two years ago but I only read it a few days ago when Kerrie Mengersen pointed it out to me. Here is an interesting criticism of Bayes factors.

“In the meantime, unfortunately, Bayes factors also suffers from several theoretical and practical difficulties. First, when improper prior distributions are used, Bayes factors contains undefined constants and takes arbitrary values (…) Second, when a proper but vague prior distribution with a large spread is used to represent prior ignorance, Bayes factors tends to favour the null hypothesis. The problem may persist even when the sample size is large (…) Third, the calculation of Bayes factors generally requires the evaluation of marginal likelihoods. In many models, the marginal likelihoods may be difficult to compute.”

I completely agree with these points, which are part of a longer list in our testing by mixture estimation paper. The authors also rightly blame the rigidity of the 0-1 loss function behind the derivation of the Bayes factor. An alternative decision-theoretic based on the Kullback-Leibler distance has been proposed by José Bernardo and Raúl Rueda, in a 2002 paper, evaluating the average divergence between the null and the full under the full, with the slight drawback that any nuisance parameter has the same prior under both hypotheses. (Which makes me think of the Savage-Dickey paradox, since everything here seems to take place under the alternative.) And the larger drawback of requiring a lower bound for rejecting the null. (Although it could be calibrated under the null prior predictive.)

This paper suggests using instead the difference of the Bayesian deviances, which is the expected log ratio integrated against the posterior. (With the possible embarrassment of the quantity having no prior expectation since the ratio depends on the data. But after all the evidence or marginal likelihood faces the same “criticism”.) So it is a sort of Bayes factor on the logarithms, with a strong similarity with Bernardo & Rueda’s solution since they are equal in expectation under the marginal. As in Dawid et al.’s recent paper, the logarithm removes the issue with the normalising constant and with the Lindley-Jeffreys paradox. The approach then needs to be calibrated in order to define a decision bound about the null. The asymptotic distribution of the criterion is  χ²(p)−p, where p is the dimension of the parameter to be tested, but this sounds like falling back on frequentist tests. And the deadly .05% bounds. I would rather favour a calibration of the criterion using prior or posterior predictives under both models…

## Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8) This new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian  principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it). [Warning: this is a rather long review and post, so readers may chose to opt out now!]

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

Posted in Books, Statistics, University life with tags , , , , , , on May 15, 2015 by xi'an

[Here is a reply by Pierre Druihlet to my comments on his paper.]

There are several goals in the paper, the last one being the most important one.

The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).

The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.

The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.
– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.
– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.

For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.

## improper priors, incorporated

Posted in Books, Statistics, University life with tags , , , , , , , , on January 11, 2012 by xi'an

If a statistical procedure is to be judged by a criterion such as a conventional loss function (…) we should not expect optimal results from a probabilistic theory that demands multiple observations and multiple parameters.” P. McCullagh & H. Han Peter McCullagh and Han Han have just published in the Annals of Statistics a paper on Bayes’ theorem for improper mixtures. This is a fascinating piece of work, even though some parts do elude me… The authors indeed propose a framework based on Kingman’s Poisson point processes that allow to include (countable) improper priors in a coherent probabilistic framework. This framework requires the definition of a test set A in the sampling space, the observations being then the events Y∩ A, Y being an infinite random set when the prior is infinite. It is therefore complicated to perceive this representation in a genuine Bayesian framework, i.e. for a single observation, corresponding to a single parameter value. In that sense it seems closer to the original empirical Bayes, à la Robbins.

An improper mixture is designed for a generic class of problems, not necessarily related to one another scientifically, but all having the same mathematical structure.” P. McCullagh & H. Han

The paper thus misses in my opinion a clear link with the design of improper priors. And it does not offer a resolution of the  improper prior Bayes factor conundrum. However, it provides a perfectly valid environment for working with improper priors. For instance, the final section on the marginalisation “paradoxes” is illuminating in this respect as it does not demand  using a limit of proper priors.

## Is Bayes posterior [quick and] dirty?!

Posted in Books, Statistics, University life with tags , , , , on April 28, 2011 by xi'an

I have been asked to discuss the on-coming Statistical Science paper by Don Fraser, Is Bayes posterior quick and dirty confidence?.  The title was intriguing if clearly provocative and so did I read through the whole paper… (The following is a draft of my discussion.)

The central point in Don’s paper seems to be a demonstration that Bayes confidence sets are not valid because they do not provide the proper frequentist coverage. While I appreciate the effort made therein of evaluating Bayesian bounds in a frequentist light, and while Don’s paper does shed new insight on the evaluation of Bayesian bounds in a frequentist light, the main point of the paper seems to be a radical reexamination of the relevance of the whole Bayesian approach to confidence regions. The outcome is rather surprising in that the disagreement between classical and frequentist perspectives is usually quite limited [in contrast with tests] in that the coverage statements agree to orders between $n^{-1/2}$ and $n^{-1}$, following older results by Welch and Peers (1963). Continue reading