Archive for Bayesian hypothesis testing

statistics for making decisions [book review]

Posted in Statistics, Books with tags , , , , , , , , , , , , on March 7, 2022 by xi'an

I bought this book [or more precisely received it from CRC Press as a ({prospective} book) review reward] as I was interested in the author’s perspectives on actual decision making (and unaware of the earlier Statistical Decision Theory book he had written in 2013). It is intended for a postgraduate semester course and  “not for a beginner in statistics”. Exercises with solutions are included in each chapter (with some R codes in the solutions). From Chapter 4 onwards, the “Further reading suggestions” are primarily referring to papers and books written by the author, as these chapters are based on his earlier papers.

“I regard hypothesis testing as a distraction from and a barrier to good statistical practice. Its ritualised application should be resisted from the position of strength, by being well acquainted with all its theoretical and practical aspects. I very much hope (…) that the right place for hypothesis testing is in a museum, next to the steam engine.”

The first chapter exposes the shortcomings of hypothesis testing for conducting decision making, in particular by ignoring the consequences of the decisions. A perspective with which I agree, but I fear the subsequent developments found in the book remain too formalised to be appealing, reverting to the over-simplification found in Neyman-Pearson theory. The second chapter is somewhat superfluous for a book assuming a prior exposure to statistics, with a quick exposition of the frequentist, Bayesian, and … fiducial paradigms. With estimators being first defined without referring to a specific loss function. And I find the presentation of the fiducial approach rather shaky (if usual). Esp. when considering fiducial perspective to be used as default Bayes in the subsequent chapters. I also do not understand the notation (p.31)


outside of a Bayesian (or fiducial?) framework. (I did not spot typos aside from the traditional “the the” duplicates, with at least six occurences!)

The aforementioned subsequent chapters are not particularly enticing as they cater to artificial loss functions and engage into detailed derivations that do not seem essential. At times they appear to be nothing more than simple calculus exercises. The very construction of the loss function, which I deem critical to implement statistical decision theory, is mostly bypassed. The overall setting is also frighteningly unidimensional. In the parameter, in the statistic, and in the decision. Covariates only appear in the final chapter which appears to have very little connection with decision making in that the loss function there is the standard quadratic loss, used to achieve the optimal composition of estimators, rather than selecting the best model. The book is also missing in practical or realistic illustrations.

“With a bit of immodesty and a tinge of obsession, I would like to refer to the principal theme of this book as a paradigm, ascribing to it as much importance and distinction as to the frequentist and Bayesian paradigms”

The book concludes with a short postscript (pp.247-249) reproducing the introducing paragraphs about the ill-suited nature of hypothesis testing for decision-making. Which would have been better supported by a stronger engagement into elicitating loss functions and quantifying the consequences of actions from the clients…

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

demystify Lindley’s paradox [or not]

Posted in Statistics with tags , , , , , on March 18, 2020 by xi'an

Another paper on Lindley’s paradox appeared on arXiv yesterday, by Guosheng Yin and Haolun Shi, interpreting posterior probabilities as p-values. The core of this resolution is to express a two-sided hypothesis as a combination of two one-sided hypotheses along the opposite direction, taking then advantage of the near equivalence of posterior probabilities under some non-informative prior and p-values in the later case. As already noted by George Casella and Roger Berger (1987) and presumably earlier. The point is that one-sided hypotheses are quite friendly to improper priors, since they only require a single prior distribution. Rather than two when point nulls are under consideration. The p-value created by merging both one-sided hypotheses makes little sense to me as it means testing that both θ≥0 and θ≤0, resulting in the proposal of a p-value that is twice the minimum of the one-sided p-values, maybe due to a Bonferroni correction, although the true value should be zero… I thus see little support for this approach to resolving Lindley paradox in that it bypasses the toxic nature of point-null hypotheses that require a change of prior toward a mixture supporting one hypothesis and the other. Here the posterior of the point-null hypothesis is defined in exactly the same way the p-value is defined, hence making the outcome most favourable to the agreement but not truly addressing the issue.

Bertrand-Borel debate

Posted in Books, Statistics with tags , , , , , , , , , , , , , on May 6, 2019 by xi'an

On her blog, Deborah Mayo briefly mentioned the Bertrand-Borel debate on the (in)feasibility of hypothesis testing, as reported [and translated] by Erich Lehmann. A first interesting feature is that both [starting with] B mathematicians discuss the probability of causes in the Bayesian spirit of Laplace. With Bertrand considering that the prior probabilities of the different causes are impossible to set and then moving all the way to dismiss the use of probability theory in this setting, nipping the p-values in the bud..! And Borel being rather vague about the solution probability theory has to provide. As stressed by Lehmann.

“The Pleiades appear closer to each other than one would naturally expect. This statement deserves thinking about; but when one wants to translate the phenomenon into numbers, the necessary ingredients are lacking. In order to make the vague idea of closeness more precise, should we look for the smallest circle that contains the group? the largest of the angular distances? the sum of squares of all the distances? the area of the spherical polygon of which some of the stars are the vertices and which contains the others in its interior? Each of these quantities is smaller for the group of the Pleiades than seems plausible. Which of them should provide the measure of implausibility? If three of the stars form an equilateral triangle, do we have to add this circumstance, which is certainly very unlikely apriori, to those that point to a cause?” Joseph Bertrand (p.166)


“But whatever objection one can raise from a logical point of view cannot prevent the preceding question from arising in many situations: the theory of probability cannot refuse to examine it and to give an answer; the precision of the response will naturally be limited by the lack of precision in the question; but to refuse to answer under the pretext that the answer cannot be absolutely precise, is to place oneself on purely abstract grounds and to misunderstand the essential nature of the application of mathematics.” Emile Borel (Chapter 4)

Another highly interesting objection of Bertrand is somewhat linked with his conditioning paradox, namely that the density of the observed unlikely event depends on the choice of the statistic that is used to calibrate the unlikeliness, which makes complete sense in that the information contained in each of these statistics and the resulting probability or likelihood differ to an arbitrary extend, that there are few cases (monotone likelihood ratio) where the choice can be made, and that Bayes factors share the same drawback if they do not condition upon the entire sample. In which case there is no selection of “circonstances remarquables”. Or of uniformly most powerful tests.

mixture modelling for testing hypotheses

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 4, 2019 by xi'an

After a fairly long delay (since the first version was posted and submitted in December 2014), we eventually revised and resubmitted our paper with Kaniav Kamary [who has now graduated], Kerrie Mengersen, and Judith Rousseau on the final day of 2018. The main reason for this massive delay is mine’s, as I got fairly depressed by the general tone of the dozen of reviews we received after submitting the paper as a Read Paper in the Journal of the Royal Statistical Society. Despite a rather opposite reaction from the community (an admittedly biased sample!) including two dozens of citations in other papers. (There seems to be a pattern in my submissions of Read Papers, witness our earlier and unsuccessful attempt with Christophe Andrieu in the early 2000’s with the paper on controlled MCMC, leading to 121 citations so far according to G scholar.) Anyway, thanks to my co-authors keeping up the fight!, we started working on a revision including stronger convergence results, managing to show that the approach leads to an optimal separation rate, contrary to the Bayes factor which has an extra √log(n) factor. This may sound paradoxical since, while the Bayes factor  converges to 0 under the alternative model exponentially quickly, the convergence rate of the mixture weight α to 1 is of order 1/√n, but this does not mean that the separation rate of the procedure based on the mixture model is worse than that of the Bayes factor. On the contrary, while it is well known that the Bayes factor leads to a separation rate of order √log(n) in parametric models, we show that our approach can lead to a testing procedure with a better separation rate of order 1/√n. We also studied a non-parametric setting where the null is a specified family of distributions (e.g., Gaussians) and the alternative is a Dirichlet process mixture. Establishing that the posterior distribution concentrates around the null at the rate √log(n)/√n. We thus resubmitted the paper for publication, although not as a Read Paper, with hopefully more luck this time!

a question from McGill about The Bayesian Choice

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , on December 26, 2018 by xi'an

I received an email from a group of McGill students working on Bayesian statistics and using The Bayesian Choice (although the exercise pictured below is not in the book, the closest being exercise 1.53 inspired from Raiffa and Shlaiffer, 1961, and exercise 5.10 as mentioned in the email):

There was a question that some of us cannot seem to decide what is the correct answer. Here are the issues,

Some people believe that the answer to both is ½, while others believe it is 1. The reasoning for ½ is that since Beta is a continuous distribution, we never could have θ exactly equal to ½. Thus regardless of α, the probability that θ=½ in that case is 0. Hence it is ½. I found a related stack exchange question that seems to indicate this as well.

The other side is that by Markov property and mean of Beta(a,a), as α goes to infinity , we will approach ½ with probability 1. And hence the limit as α goes to infinity for both (a) and (b) is 1. I think this also could make sense in another context, as if you use the Bayes factor representation. This is similar I believe to the questions in the Bayesian Choice, 5.10, and 5.11.

As it happens, the answer is ½ in the first case (a) because π(H⁰) is ½ regardless of α and 1 in the second case (b) because the evidence against H⁰ goes to zero as α goes to zero (watch out!), along with the mass of the prior on any compact of (0,1) since Γ(2α)/Γ(α)². (The limit does not correspond to a proper prior and hence is somewhat meaningless.) However, when α goes to infinity, the evidence against H⁰ goes to infinity and the posterior probability of ½ goes to zero, despite the prior under the alternative being more and more concentrated around ½!

%d bloggers like this: