**S**urprisingly (or not?!), I received two requests about some exercises from The Bayesian Choice, one from a group of students from McGill having difficulties solving the above, wondering about the properness of the posterior (but missing the integration of x), to whom I sent back this correction. And another one from the Czech Republic about a difficulty with the term “evaluation” by which I meant (pardon my French!) estimation.

## Archive for improper prior

## back to the Bayesian Choice

Posted in Books, Kids, Statistics, University life with tags autoregressive model, Bayesian decision theory, Book, exercises, improper posteriors, improper prior, inverse Gamma distribution, prior predictive, The Bayesian Choice on October 17, 2018 by xi'an## A new approach to Bayesian hypothesis testing

Posted in Books, Statistics with tags Bayes factor, Bayesian decision theory, Bayesian tests of hypotheses, deviance, improper prior, Kullback-Leibler divergence on September 8, 2016 by xi'an

“The main purpose of this paper is to develop a new Bayesian hypothesis testing approach for the point null hypothesis testing (…) based on the Bayesian deviance and constructed in a decision theoretical framework. It can be regarded as the Bayesian version of the likelihood ratio test.”

**T**his paper got published in *Journal of Econometrics* two years ago but I only read it a few days ago when Kerrie Mengersen pointed it out to me. Here is an interesting criticism of Bayes factors.

“In the meantime, unfortunately, Bayes factors also suffers from several theoretical and practical difficulties. First, when improper prior distributions are used,Bayes factorscontains undefined constants and takes arbitrary values (…) Second, when a proper but vague prior distribution with a large spread is used to represent prior ignorance,Bayes factorstends to favour the null hypothesis. The problem may persist even whenthe sample size is large (…) Third, the calculation ofBayes factorsgenerally requires the evaluation of marginal likelihoods. In many models, the marginal likelihoods may be difficult to compute.”

I completely agree with these points, which are part of a longer list in our testing by mixture estimation paper. The authors also rightly blame the rigidity of the 0-1 loss function behind the derivation of the Bayes factor. An alternative decision-theoretic based on the Kullback-Leibler distance has been proposed by José Bernardo and Raúl Rueda, in a 2002 paper, evaluating the average divergence between the null and the full under the full, with the slight drawback that any nuisance parameter has the same prior under both hypotheses. (Which makes me think of the Savage-Dickey paradox, since everything here seems to take place under the alternative.) And the larger drawback of requiring a lower bound for rejecting the null. (Although it could be calibrated under the null prior predictive.)

This paper suggests using instead the difference of the Bayesian deviances, which is the expected log ratio integrated against the posterior. (With the possible embarrassment of the quantity having no prior expectation since the ratio depends on the data. But after all the evidence or marginal likelihood faces the same “criticism”.) So it is a sort of Bayes factor on the logarithms, with a strong similarity with Bernardo & Rueda’s solution since they are equal in expectation under the marginal. As in Dawid et al.’s recent paper, the logarithm removes the issue with the normalising constant and with the Lindley-Jeffreys paradox. The approach then needs to be calibrated in order to define a decision bound about the null. The asymptotic distribution of the criterion is χ²(p)−p, where p is the dimension of the parameter to be tested, but this sounds like falling back on frequentist tests. And the deadly .05% bounds. I would rather favour a calibration of the criterion using prior or posterior predictives under both models…

## Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags ABC, Bayes factor, CHANCE, CRC Press, discrepancies, Error and Inference, improper prior, integrated likelihood, Jeffreys-Lindley paradox, Likelihood Principle, marginalisation paradoxes, model checking, model validation, Monty Hall problem, Murray Aitkin, p-value, point null hypotheses, relative belief ratio, University of Toronto on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)

**T**his new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it).* [Warning: this is a rather long review and post, so readers may chose to opt out now!]*

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

## the Flatland paradox [reply from the author]

Posted in Books, Statistics, University life with tags Abbot, flat prior, Flatland, Gaussian random walk, improper prior, marginalisation paradoxes, Mervyn Stone on May 15, 2015 by xi'an*[Here is a reply by Pierre Druihlet to my comments on his paper.]*

**T**here are several goals in the paper, the last one being the most important one.

The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).

The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.

The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.

– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.

– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.

For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.

## improper priors, incorporated

Posted in Books, Statistics, University life with tags Annals of Statistics, Bayes factor, Bayes theorem, countable measure, empirical Bayes methods, improper prior, marginalisation paradoxes, Poisson point process, random set on January 11, 2012 by xi'an“

If a statistical procedure is to be judged by a criterion such as a conventional loss function (…) we should not expect optimal results from a probabilistic theory that demands multiple observations and multiple parameters.” P. McCullagh & H. Han

**P**eter McCullagh and Han Han have just published in the Annals of Statistics a paper on *Bayes’ theorem for improper mixtures*. This is a fascinating piece of work, even though some parts do elude me… The authors indeed propose a framework based on Kingman’s Poisson point processes that allow to include (countable) improper priors in a coherent probabilistic framework. This framework requires the definition of a test set A in the sampling space, the observations being then the events Y∩ A, Y being an infinite random set when the prior is infinite. It is therefore complicated to perceive this representation in a genuine Bayesian framework, i.e. for a single observation, corresponding to a single parameter value. In that sense it seems closer to the original empirical Bayes, *à la* Robbins.

“

An improper mixture is designed for a generic class of problems, not necessarily related to one another scientifically, but all having the same mathematical structure.” P. McCullagh & H. Han

**T**he paper thus misses in my opinion a clear link with the design of improper priors. And it does not offer a resolution of the improper prior Bayes factor conundrum. However, it provides a perfectly valid environment for working with improper priors. For instance, the final section on the marginalisation “paradoxes” is illuminating in this respect as it does not demand using a limit of proper priors.

## Is Bayes posterior [quick and] dirty?!

Posted in Books, Statistics, University life with tags Bayesian inference, credible intervals, foundations, improper prior, Thomas Bayes on April 28, 2011 by xi'an**I** have been asked to discuss the on-coming ** Statistical Science** paper by Don Fraser, “

*Is Bayes posterior quick and dirty confidence?*“. The title was intriguing if clearly provocative and so did I read through the whole paper… (

*The following is a draft of my discussion.*)

**T**he central point in Don’s paper seems to be a demonstration that Bayes confidence sets are not valid because they do not provide the proper frequentist coverage. While I appreciate the effort made therein of evaluating Bayesian bounds in a frequentist light, and while Don’s paper does shed new insight on the evaluation of Bayesian bounds in a frequentist light, the main point of the paper seems to be a radical reexamination of the relevance of the whole Bayesian approach to confidence regions. The outcome is rather surprising in that the disagreement between classical and frequentist perspectives is usually quite limited *[in contrast with tests]* in that the coverage statements agree to orders between and , following older results by Welch and Peers (1963). Continue reading

## Thesis defense in València

Posted in Statistics, Travel, University life, Wines with tags Bayes factor, Bayesian model choice, Harold Jeffreys, hyper-g priors, improper prior, invariance, València on February 25, 2011 by xi'an**O**n Monday, I took part in the jury of the PhD thesis of Anabel Forte Deltel, in the department of statistics of the Universitat de València. The topic of the thesis was variable selection in Gaussian linear models using an objective Bayes approach. Completely on my own research agenda! I had already discussed with Anabel in Zürich, where she gave a poster and gave me a copy of her thesis, so could concentrate on the fundamentals of her approach during the defense. Her approach extends Liang et al. (2008, JASA) hyper-g prior in a complete analysis of the conditions set by Jeffreys in his book for constructing such priors. She is therefore able to motivate a precise value for most hyperparameters (although some choices were mainly based on computational reasons opposing * _{2}F_{1}* with Appell’s

*F*hypergeometric functions). She also defends the use of an improper prior by an invariance argument that leads to the standard Jeffreys’ prior on location-scale. (This is where I prefer the approach in

_{1}*that does not discriminate between a subset of the covariates including the intercept and the other covariates. Even though it is not invariant by location-scale transforms.) After the defence, Jim Berger pointed out to me that the modelling allowed for the subset to be empty, which would then cancel my above objection! In conclusion, this thesis could well set a reference prior (if not in José Bernardo’s sense of the term!) for Bayesian linear model analysis in the coming years.*

**Bayesian Core**