same data – different models – different answers

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on June 1, 2016 by xi'an

An interesting question from a reader of the Bayesian Choice came out on X validated last week. It was about Laplace’s succession rule, which I found somewhat over-used, but it was nonetheless interesting because the question was about the discrepancy of the “non-informative” answers derived from two models applied to the data: an Hypergeometric distribution in the Bayesian Choice and a Binomial on Wikipedia. The originator of the question had trouble with the difference between those two “non-informative” answers as she or he believed that there was a single non-informative principle that should lead to a unique answer. This does not hold, even when following a reference prior principle like Jeffreys’ invariant rule or Jaynes’ maximum entropy tenets. For instance, the Jeffreys priors associated with a Binomial and a Negative Binomial distributions differ. And even less when considering that  there is no unity in reaching those reference priors. (Not even mentioning the issue of the reference dominating measure for the definition of the entropy.) This led to an informative debate, which is the point of X validated.

On a completely unrelated topic, the survey ship looking for the black boxes of the crashed EgyptAir plane is called the Laplace.

global-local mixtures

Posted in Books, pictures, Running, Statistics, Travel with tags , , on May 4, 2016 by xi'an

Anindya Bhadra, Jyotishka Datta, Nick Polson and Brandon Willard have arXived this morning a short paper on global-local mixtures. Although the definition given in the paper (p.1) is rather unclear, those mixtures are distributions of a sample that are marginals over component-wise (local) and common (global) parameters. The observations of the sample are (marginally) exchangeable if not independent.

“The Cauchy-Schlömilch transformation not only guarantees an ‘astonishingly simple’ normalizing constant for f(·), it also establishes the wide class of unimodal densities as global-local scale mixtures.”

The paper relies on the Cauchy-Schlömilch identity

$\int_0^\infty f(\{x-g(x)\}^2)\text{d}x=\int_0^\infty f(y^2)\text{d}y\qquad \text{with}\quad g(x)=g^{-1}(x)$

a self-inverse function. This generic result proves helpful in deriving demarginalisations of a Gaussian distribution for densities outside the exponential family like Laplace’s. (This is getting very local for me as Cauchy‘s house is up the hill, while Laplace lived two train stations away. Before train was invented, of course.) And for logistic regression. The paper also briefly mentions Etienne Halphen for his introduction of generalised inverse Gaussian distributions, Halphen who was one of the rare French Bayesians, worked for the State Electricity Company (EDF) and briefly with Lucien Le Cam (before the latter left for the USA). Halphen introduced some families of distributions during the early 1940’s, including the generalised inverse Gaussian family, which were first presented by his friend Daniel Dugué to the Académie des Sciences maybe because of the Vichy racial laws… A second result of interest in the paper is that, given a density g and a transform s on positive real numbers that is decreasing and self-inverse, the function f(x)=2g(x-s(x)) is again a density, which can again be represented as a global-local mixture. [I wonder if these representations could be useful in studying the Cauchy conjecture solved last year by Natesh and Xiao-Li.]

Gauss to Laplace transmutation interpreted

Posted in Books, Kids, Statistics, University life with tags , , , , , , on November 9, 2015 by xi'an

Following my earlier post [induced by browsing X validated], on the strange property that the product of a Normal variate by an Exponential variate is a Laplace variate, I got contacted by Peng Ding from UC Berkeley, who showed me how to derive the result by a mere algebraic transform, related with the decomposition

(X+Y)(X-Y)=X²-Y² ~ 2XY

when X,Y are iid Normal N(0,1). Peng Ding and Joseph Blitzstein have now arXived a note detailing this derivation, along with another derivation using the moment generating function. As a coincidence, I also came across another interesting representation on X validated, namely that, when X and Y are Normal N(0,1) variates with correlation ρ,

XY ~ R(cos(πU)+ρ)

with R Exponential and U Uniform (0,1). As shown by the OP of that question, it is a direct consequence of the decomposition of (X+Y)(X-Y) and of the polar or Box-Muller representation. This does not lead to a standard distribution of course, but remains a nice representation of the product of two Normals.

Gauss to Laplace transmutation!

Posted in Books, Kids, Statistics, University life with tags , , , , on October 14, 2015 by xi'an

When browsing X validated the other day [translate by procrastinating!], I came upon the strange property that the marginal distribution of a zero mean normal variate with exponential variance is a Laplace distribution. I first thought there was a mistake since we usually take an inverse Gamma on the variance parameter, not a Gamma. But then the marginal is a t distribution. The result is curious and can be expressed in a variety of ways:

– the product of a χ21 and of a χ2 is a χ22;
– the determinant of a 2×2 normal matrix is a Laplace variate;
– a difference of exponentials is Laplace…

The OP was asking for a direct proof of the result and I eventually sorted it out by a series of changes of variables, although there exists a much more elegant and general proof by Mike West, then at the University of Warwick, based on characteristic functions (or Fourier transforms). It reminded me that continuous, unimodal [at zero] and symmetric densities were necessary scale mixtures [a wee misnomer] of Gaussians. Mike proves in this paper that exponential power densities [including both the Normal and the Laplace cases] correspond to the variances having an inverse positive stable distribution with half the power. And this is a straightforward consequence of the exponential power density being proportional to the Fourier transform of a stable distribution and of a Fubini inversion. (Incidentally, the processing times of Biometrika were not that impressive at the time, with a 2-page paper submitted in Dec. 1984 published in Sept. 1987!)

This is a very nice and general derivation, but I still miss the intuition as to why it happens that way. But then, I know nothing, and even less about products of random variates!

Laplace great⁶-grand child!

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , , on August 3, 2015 by xi'an

Looking at the Family Tree application (I discovered via Peter Coles’ blog), I just found out that I was Laplace’s [academic] great-great-great-great-great-great-great-grand-child! Through Poisson and Chasles. Going even further, as Simeon Poisson was also advised by Lagrange, my academic lineage reaches Euler and the Bernoullis. Pushing always further, I even found William of Ockham along one of the “direct” branches! Amazing ancestry, to which my own deeds pay little homage if any… (However, I somewhat doubt the strength of the links for the older names, since pursuing them ends up at John the Baptist!)

I wonder how many other academic descendants of Laplace are alive today. Too bad Family Tree does not seem to offer this option! Given the longevity of both Laplace and Poisson, they presumably taught many students, which means a lot of my colleagues and even of my Bayesian colleagues should share the same illustrious ancestry. For instance, I share part of this ancestry with Gérard Letac. And both Jean-Michel Marin and Arnaud Guillin. Actually, checking with the Mathematics Genealogy Project, I see that Laplace had… one student!, but still a grand total of [at least] 85,738 descendants… Incidentally, looking at the direct line, most of those had very few [recorded] descendants.

eliminating an important obstacle to creative thinking: statistics…

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on March 12, 2015 by xi'an

“We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.”

About a month ago, David Trafimow and Michael Marks, the current editors of the journal Basic and Applied Social Psychology published an editorial banning all null hypothesis significance testing procedures (acronym-ed into the ugly NHSTP which sounds like a particularly nasty venereal disease!) from papers published by the journal. My first reaction was “Great! This will bring more substance to the papers by preventing significance fishing and undisclosed multiple testing! Power to the statisticians!” However, after reading the said editorial, I realised it was inspired by a nihilistic anti-statistical stance, backed by an apparent lack of understanding of the nature of statistical inference, rather than a call for saner and safer statistical practice. The editors most clearly state that inferential statistical procedures are no longer needed to publish in the journal, only “strong descriptive statistics”. Maybe to keep in tune with the “Basic” in the name of the journal!

“In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval.”

The above quote could be a motivation for a Bayesian approach to the testing problem, a revolutionary stance for journal editors!, but it only illustrate that the editors wish for a procedure that would eliminate the uncertainty inherent to statistical inference, i.e., to decision making under… erm, uncertainty: “The state of the art remains uncertain.” To fail to separate significance from certainty is fairly appalling from an epistemological perspective and should be a case for impeachment, were any such thing to exist for a journal board. This means the editors cannot distinguish data from parameter and model from reality! Even more fundamentally, to bar statistical procedures from being used in a scientific study is nothing short of reactionary. While encouraging the inclusion of data is a step forward, restricting the validation or in-validation of hypotheses to gazing at descriptive statistics is many steps backward and does completely jeopardize the academic reputation of the journal, which editorial may end up being the last quoted paper. Is deconstruction now reaching psychology journals?! To quote from a critic of this approach, “Thus, the general weaknesses of the deconstructive enterprise become self-justifying. With such an approach I am indeed not sympathetic.” (Searle, 1983).

“The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist (…) With respect to Bayesian procedures, we reserve the right to make case-by-case judgments, and thus Bayesian procedures are neither required nor banned from BASP.”

The section of Bayesian approaches is trying to be sympathetic to the Bayesian paradigm but again reflects upon the poor understanding of the authors. By “Laplacian assumption”, they mean Laplace´s Principle of Indifference, i.e., the use of uniform priors, which is not seriously considered as a sound principle since the mid-1930’s. Except maybe in recent papers of Trafimow. I also love the notion of “generat[ing] numbers when none exist”, as if the prior distribution had to be grounded in some physical reality! Although it is meaningless, it has some poetic value… (Plus, bringing Popper and Fisher to the rescue sounds like shooting Bayes himself in the foot.)  At least, the fact that the editors will consider Bayesian papers in a case-by-case basis indicate they may engage in a subjective Bayesian analysis of each paper rather than using an automated p-value against the 100% rejection bound!

[Note: this entry was suggested by Alexandra Schmidt, current ISBA President, towards an incoming column on this decision of Basic and Applied Social Psychology for the ISBA Bulletin.]

re-re-relevant statistics for ABC model choice

Posted in Books, Statistics, University life with tags , , , , , , on March 18, 2013 by xi'an

After a very, very long delay, we eventually re-revised our paper about necessary and sufficient conditions on summary statistics to be relevant for model choice (i.e. to lead to consistent tests). Reasons, both good and bad, abound for this delay! Some (rather bad) were driven by the completion of a certain new edition… Some (fairly good) are connected with the requests from the Series B editorial team, towards improving our methodological input.  As a result we put more emphasis on the post-ABC cross-checking for the relevance of the summary choice, via a predictive posterior evaluation of the means of the summary statistic under both models and a test for mean equality. And re-ran a series of experiments on a three population population genetic example. Plus, on the side, simplified some of our assumptions. I dearly hope the paper can make it through but am also looking forward the opinion of the Series B editorial team  The next version of Relevant statistics for Bayesian model choice should be arXived by now (meaning when this post appears!).