**T**oday, Danilo Alvares visiting from the Universitat de Valencià gave a talk at CREST about choosing a prior for the Multinomial distribution. Comparing different Dirichlet priors. In a sense this is an hopeless task, first because there is no reason to pick a particular prior unless one picks a very specific and a-Bayesian criterion to discriminate between priors, second because the multinomial is a weird distribution, hardly a distribution at all in that it results from grouping observations into classes, often based on the observations themselves. A construction that should be included within the choice of the prior maybe? But there lurks a danger of ending up with a data-dependent prior. My other remark about this problem is that, among the token priors, Perk’s prior using 1/k as its hyper-parameter [where k is the number of categories] is rather difficult to justify compared with 1/k² or 1/k³, except for aggregation consistency to some extent. And Laplace’s prior gets highly concentrated as the number of categories grows.

## Archive for Haldane’s prior

## objectivity in prior distributions for the multinomial model

Posted in Statistics, University life with tags Haldane's prior, Laplace's prior, multinomial distribution, non-informative priors, objective Bayes, prior selection on March 17, 2016 by xi'an## uniform correlation mixtures

Posted in Books, pictures, Statistics, University life with tags Box-Muller algorithm, Haldane's prior, infinite norm, Jeffreys priors, Khintchine representation, marginalisation, Mathias Drton, normal distribution, Philadelphia, Wharton Business School on December 4, 2015 by xi'an**K**ai Zhang and my friends from Wharton, Larry Brown, Ed George and Linda Zhao arXived last week a neat mathematical foray into the properties of a marginal bivariate Gaussian density once the correlation ρ is integrated out. While the univariate marginals remain Gaussian (unsurprising, since these marginals do not depend on ρ in the first place), the joint density has the surprising property of being

[1-Φ(max{|x|,|y|})]/2

which turns an infinitely regular density into a density that is not even differentiable everywhere. And which is constant on squares rather than circles or ellipses. This is somewhat paradoxical in that the intuition (at least my intuition!) is that integration increases regularity… I also like the characterisation of the distributions factorising through the infinite norm as scale mixtures of the infinite norm equivalent of normal distributions. The paper proposes several threads for some extensions of this most surprising result. Other come to mind:

- What happens when the Jeffreys prior is used in place of the uniform? Or Haldane‘s prior?
- Given the mixture representation of t distributions, is there an equivalent for t distributions?
- Is there any connection with the equally surprising resolution of the Drton conjecture by Natesh Pillai and Xiao-Li Meng?
- In the Khintchine representation, correlated normal variates are created by multiplying a single χ²(3) variate by a vector of uniforms on (-1,1). What are the resulting variates for other degrees of freedomk in the χ²(k) variate?
- I also wonder at a connection between this Khintchine representation and the Box-Müller algorithm, as in this earlier X validated question that I turned into an exam problem.

## on the origin of the Bayes factor

Posted in Books, Statistics with tags Bayes factors, full Bayesian significance test, Haldane's prior, Harold Jeffreys, Jack Haldane, Jeffreys priors, non-informative priors, scientific inference on November 27, 2015 by xi'an**A**lexander Etz and Eric-Jan Wagenmakers from the Department of Psychology of the University of Amsterdam just arXived a paper on the invention of the Bayes factor. In particular, they highlight the role of John Burdon Sanderson (J.B.S.) Haldane in the use of the central tool for Bayesian comparison of hypotheses. In short, Haldane used a Bayes factor before Jeffreys did!

“The idea of a significance test, I suppose, putting half the probability into a constant being 0, and distributing the other half over a range of possible values.”H. Jeffreys

The authors analyse Jeffreys’ 1935 paper on significance tests, which appears to be the very first occurrence of a Bayes factor in his bibliography, testing whether or not two probabilities are equal. They also show the roots of this derivation in earlier papers by Dorothy Wrinch and Harold Jeffreys. [As an “aside”, the early contributions of Dorothy Wrinch to the foundations of 20th Century Bayesian statistics are hardly acknowledged. A shame, when considering they constitute the basis and more of Jeffreys’ 1931 *Scientific Inference*, Jeffreys who wrote in her necrology “I should like to put on record my appreciation of the substantial contribution she made to [our joint] work, which is the basis of all my later work on scientific inference.” In retrospect, Dorothy Wrinch should have been co-author to this book…] As early as 1919. These early papers by Wrinch and Jeffreys are foundational in that they elaborate a construction of prior distributions that will eventually see the Jeffreys non-informative prior as its final solution [*Jeffreys priors* that should be called *Lhostes priors* according to Steve Fienberg, although I think Ernest Lhoste only considered a limited number of transformations in his invariance rule]. The 1921 paper contains *de facto* the Bayes factor but it does not appear to be advocated as a tool *per se* for conducting significance tests.

“The historical records suggest that Haldane calculated the first Bayes factor, perhaps almost by accident, before Jeffreys did.” A. Etz and E.J. Wagenmakers

As another interesting aside, the historical account points out that Jeffreys came out in 1931 with what is now called Haldane’s prior for a Binomial proportion, proposed in 1931 (when the paper was read) and in 1932 (when the paper was published in the *Mathematical Proceedings of the Cambridge Philosophical Society)* by Haldane. The problem tackled by Haldane is again a significance on a Binomial probability. Contrary to the authors, I find the original (quoted) text quite clear, with a prior split before a uniform on [0,½] and a point mass at ½. Haldane uses a posterior odd [of 34.7] to compare both hypotheses but… I see no trace in the quoted material that he ends up using the Bayes factor as such, that is as his decision rule. (I acknowledge *decision rule* is anachronistic in this setting.) On the side, Haldane also implements model averaging. Hence my reading of this reading of the 1930’s literature is that it remains unclear that Haldane perceived the Bayes factor as a Bayesian [another anachronism] inference tool, upon which [and only which] significance tests could be conducted. That Haldane had a remarkably modern view of splitting the prior according to two orthogonal measures and of correctly deriving the posterior odds is quite clear. With the very neat trick of removing the infinite integral at p=0, an issue that Jeffreys was fighting with at the same time. In conclusion, I would thus rephrase the major finding of this paper as Haldane should get the priority in deriving the Bayesian significance test for point null hypotheses, rather than in deriving the Bayes factor. But this may be my biased views of Bayes factors speaking there…

Another amazing fact I gathered from the historical work of Etz and Wagenmakers is that Haldane and Jeffreys were geographically very close while working on the same problem and hence should have known and referenced their respective works. Which did not happen.

## approximation of improper by vague priors

Posted in Statistics, University life with tags Haar measure, Haldane's prior, John Burdon Sanderson Haldane, Lebesgue measure on November 18, 2013 by xi'an

“…many authors prefer to replace these improper priors by vague priors, i.e. probability measures that aim to represent very few knowledge on the parameter.”

**C**hristèle Bioche and Pierre Druihlet arXived a few days ago a paper with this title. They aim at bringing a new light on the convergence of vague priors to their limit. Their notion of convergence is a pointwise convergence in the quotient space of Radon measures, quotient being defined by the removal of the “normalising” constant. The first results contained in the paper do not show particularly enticing properties of the improper limit of proper measures as the limit cannot be given any (useful) probabilistic interpretation. (A feature already noticeable when reading Jeffreys.) The first result that truly caught my interest in connection with my current research is the fact that the Haar measures appear as a (weak) limit of conjugate priors (Section 2.5). And that the Jeffreys prior is the limit of the parametrisation-free conjugate priors of Druilhet and Pommeret (2012, Bayesian Analysis, a paper I will discuss soon!). The result about the convergence of posterior means is rather anticlimactic as the basis assumption is the uniform integrability of the sequence of the prior densities. An interesting counterexample (somehow familiar to invariance fans): the sequence of Poisson distributions with mean n has no weak limit. And the Haldane prior does appear as a limit of Beta distributions (less surprising). On (0,1) if not on [0,1].

The paper contains a section on the Jeffreys-Lindley paradox, which is only considered from the second perspective, the one I favour. There is however a mention made of the noninformative answer, which is the (meaningless) one associated with the Lebesgue measure of normalising constant one. This Lebesgue measure also appears as a weak limit in the paper, even though the limit of the posterior probabilities is 1. Except when the likelihood has bounded variations outside compacts. Then the limit of the probabilities is the prior probability of the null… Interesting, truly, but not compelling enough to change my perspective on the topic. *(And thanks to the authors for their thanks!)*