Archive for Pierre Simon de Laplace

Bayesian methods in cosmology

Posted in Statistics with tags , , , , , , , , , , , , on January 18, 2017 by xi'an

A rather massive document was arXived a few days ago by Roberto Trotta on Bayesian methods for cosmology, in conjunction with an earlier winter school, the 44th Saas Fee Advanced Course on Astronomy and Astrophysics, “Cosmology with wide-field surveys”. While I never had the opportunity to give a winter school in Saas Fee, I will give next month a course on ABC to statistics graduates in another Swiss dream location, Les Diablerets.  And next Fall a course on ABC again but to astronomers and cosmologists, in Autrans, near Grenoble.

The course document is an 80 pages introduction to probability and statistics, in particular Bayesian inference and Bayesian model choice. Including exercises and references. As such, it is rather standard in that the material could be found as well in textbooks. Statistics textbooks.

When introducing the Bayesian perspective, Roberto Trotta advances several arguments in favour of this approach. The first one is that it is generally easier to follow a Bayesian approach when compared with seeking a non-Bayesian one, while recovering long-term properties. (Although there are inconsistent Bayesian settings.) The second one is that a Bayesian modelling allows to handle naturally nuisance parameters, because there are essentially no nuisance parameters. (Even though preventing small world modelling may lead to difficulties as in the Robbins-Wasserman paradox.) The following two reasons are the incorporation of prior information and the appeal on conditioning on the actual data.

trottaThe document also includes this above and nice illustration of the concentration of measure as the dimension of the parameter increases. (Although one should not over-interpret it. The concentration does not occur in the same way for a normal distribution for instance.) It further spends quite some space on the Bayes factor, its scaling as a natural Occam’s razor,  and the comparison with p-values, before (unsurprisingly) introducing nested sampling. And the Savage-Dickey ratio. The conclusion of this model choice section proposes some open problems, with a rather unorthodox—in the Bayesian sense—line on the justification of priors and the notion of a “correct” prior (yeech!), plus an musing about adopting a loss function, with which I quite agree.

Bayesian astrostats under Laplace’s gaze

Posted in Books, Kids, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , on October 11, 2016 by xi'an

This afternoon, I was part of a jury of an astrostatistics thesis, where the astronomy part was about binary objects in the Solar System, and the statistics part about detecting patterns in those objects, unsurprisingly. The first part was highly classical using several non-parametric tests like Kolmogorov-Smirnov to test whether those binary objects were different from single objects. While the p-values were very tiny, I felt these values were over-interpreted in the thesis, because the sample size of N=30 leads to some scepticism about numerical quantities like 0.0008. While I do not want to sound pushing for Bayesian solutions in every setting, this case is a good illustration of the nefarious power of p-values, which are almost always taken at face value, i.e., where 0.008 is understood in terms of the null hypothesis and not in terms of the observed realisation of the p-value. Even within a frequentist framework, the distribution of this p-value should be evaluated or estimated one way or another, as there is no reason to believe it is anywhere near a Uniform(0,1) distribution.The second part of the thesis was about the estimation of some parameters of the laws of the orbits of those dual objects and the point of interest for me was the purely mechanical construction of a likelihood function that was an exponential transform of a sum of residuals, made of squared differences between the observations and their expectations. Or a power of such differences. This was called the “statistical model” in the thesis and I presume in part of the astrostats literature. This reminded me of the first meeting I had with my colleagues from Besançon, where they could not use such mechanical versions because of intractable expectations and used instead simulations from their physical model, literally reinventing ABC. This resolution had the same feeling, closer to indirect inference than regular inference, although it took me half the defence to realise it.

The defence actually took part in the beautiful historical Perrault’s building of Observatoire de Paris, in downtown Paris, where Cassini, Arago and Le Verrier once ruled!  In the council room under paintings of major French astronomers, including Laplace himself, looking quite smug in his academician costume. The building is built around the Paris Zero Meridian (which got dethroned in 1911 by the Greenwich Zero Meridian, which I contemplated as a kid since my childhood church had the Greenwich drawn on the nave stones). The customary “pot” after the thesis and its validation by the jury was in the less historical cafeteria of the Observatoire, but it included a jazz big band, which made this thesis defence quite unique in many ways!

snapshots from Nature

Posted in Books, Kids, pictures, University life with tags , , , , , , , , , , on September 19, 2016 by xi'an

Among many interesting things I read from the pile of Nature issues that had accumulated over a month of travelling, with a warning these are mostly “old” news by now!:

  • the very special and untouched case of Cuba in terms of the Zika epidemics, thanks to a long term policy fighting mosquitoes at all levels of the society;
  • an impressive map of the human cortex, which statistical analysis would be fascinating;
  • an excerpt from Nature 13 August 1966 where the Poisson distribution was said to describe the distribution of scores during the 1966 World Cup;
  • an analysis of a genetic experiment on evolution involving 50,000 generations (!) of Escherichia coli;
  • a look back at the great novel Flowers for Algernon, novel I read eons ago;
  • a Nature paper on the first soft robot, or octobot, along with some easier introduction, which did not tell which kind of operations could be accomplished by such a robot;
  • a vignette on a Science paper about the interaction between honey hunters and hunting birds, which I also heard depicted on the French National Radio, with an experiment comparing the actual hunting (human) song, a basic sentence in the local language, and the imitation of the song of another bird. I could not understand why the experiment did not include hunting songs from other hunting groups, as they are highly different but just as effective. It would have helped in understanding how innate the reaction of the bird is;
  • another literary entry at the science behind Mary Shelley’s Frankenstein;
  • a study of the Mathematical Genealogy Project in terms of the few mathematicians who started most genealogies of mathematicians, including d’Alembert, advisor to Laplace of whom I am one of the many descendants, although the finding is not that astounding when considering usual genealogies where most branches die off and the highly hierarchical structure of power in universities of old.

same data – different models – different answers

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on June 1, 2016 by xi'an

An interesting question from a reader of the Bayesian Choice came out on X validated last week. It was about Laplace’s succession rule, which I found somewhat over-used, but it was nonetheless interesting because the question was about the discrepancy of the “non-informative” answers derived from two models applied to the data: an Hypergeometric distribution in the Bayesian Choice and a Binomial on Wikipedia. The originator of the question had trouble with the difference between those two “non-informative” answers as she or he believed that there was a single non-informative principle that should lead to a unique answer. This does not hold, even when following a reference prior principle like Jeffreys’ invariant rule or Jaynes’ maximum entropy tenets. For instance, the Jeffreys priors associated with a Binomial and a Negative Binomial distributions differ. And even less when considering that  there is no unity in reaching those reference priors. (Not even mentioning the issue of the reference dominating measure for the definition of the entropy.) This led to an informative debate, which is the point of X validated.

On a completely unrelated topic, the survey ship looking for the black boxes of the crashed EgyptAir plane is called the Laplace.

global-local mixtures

Posted in Books, pictures, Running, Statistics, Travel with tags , , on May 4, 2016 by xi'an

Anindya Bhadra, Jyotishka Datta, Nick Polson and Brandon Willard have arXived this morning a short paper on global-local mixtures. Although the definition given in the paper (p.1) is rather unclear, those mixtures are distributions of a sample that are marginals over component-wise (local) and common (global) parameters. The observations of the sample are (marginally) exchangeable if not independent.

“The Cauchy-Schlömilch transformation not only guarantees an ‘astonishingly simple’ normalizing constant for f(·), it also establishes the wide class of unimodal densities as global-local scale mixtures.”

The paper relies on the Cauchy-Schlömilch identity

\int_0^\infty f(\{x-g(x)\}^2)\text{d}x=\int_0^\infty f(y^2)\text{d}y\qquad \text{with}\quad g(x)=g^{-1}(x)

a self-inverse function. This generic result proves helpful in deriving demarginalisations of a Gaussian distribution for densities outside the exponential family like Laplace’s. (This is getting very local for me as Cauchy‘s house is up the hill, while Laplace lived two train stations away. Before train was invented, of course.) And for logistic regression. The paper also briefly mentions Etienne Halphen for his introduction of generalised inverse Gaussian distributions, Halphen who was one of the rare French Bayesians, worked for the State Electricity Company (EDF) and briefly with Lucien Le Cam (before the latter left for the USA). Halphen introduced some families of distributions during the early 1940’s, including the generalised inverse Gaussian family, which were first presented by his friend Daniel Dugué to the Académie des Sciences maybe because of the Vichy racial laws… A second result of interest in the paper is that, given a density g and a transform s on positive real numbers that is decreasing and self-inverse, the function f(x)=2g(x-s(x)) is again a density, which can again be represented as a global-local mixture. [I wonder if these representations could be useful in studying the Cauchy conjecture solved last year by Natesh and Xiao-Li.]

Gauss to Laplace transmutation interpreted

Posted in Books, Kids, Statistics, University life with tags , , , , , , on November 9, 2015 by xi'an

Following my earlier post [induced by browsing X validated], on the strange property that the product of a Normal variate by an Exponential variate is a Laplace variate, I got contacted by Peng Ding from UC Berkeley, who showed me how to derive the result by a mere algebraic transform, related with the decomposition

(X+Y)(X-Y)=X²-Y² ~ 2XY

when X,Y are iid Normal N(0,1). Peng Ding and Joseph Blitzstein have now arXived a note detailing this derivation, along with another derivation using the moment generating function. As a coincidence, I also came across another interesting representation on X validated, namely that, when X and Y are Normal N(0,1) variates with correlation ρ,

XY ~ R(cos(πU)+ρ)

with R Exponential and U Uniform (0,1). As shown by the OP of that question, it is a direct consequence of the decomposition of (X+Y)(X-Y) and of the polar or Box-Muller representation. This does not lead to a standard distribution of course, but remains a nice representation of the product of two Normals.

Gauss to Laplace transmutation!

Posted in Kids, Statistics, University life, Books with tags , , , , on October 14, 2015 by xi'an

When browsing X validated the other day [translate by procrastinating!], I came upon the strange property that the marginal distribution of a zero mean normal variate with exponential variance is a Laplace distribution. I first thought there was a mistake since we usually take an inverse Gamma on the variance parameter, not a Gamma. But then the marginal is a t distribution. The result is curious and can be expressed in a variety of ways:

– the product of a χ21 and of a χ2 is a χ22;
– the determinant of a 2×2 normal matrix is a Laplace variate;
– a difference of exponentials is Laplace…

The OP was asking for a direct proof of the result and I eventually sorted it out by a series of changes of variables, although there exists a much more elegant and general proof by Mike West, then at the University of Warwick, based on characteristic functions (or Fourier transforms). It reminded me that continuous, unimodal [at zero] and symmetric densities were necessary scale mixtures [a wee misnomer] of Gaussians. Mike proves in this paper that exponential power densities [including both the Normal and the Laplace cases] correspond to the variances having an inverse positive stable distribution with half the power. And this is a straightforward consequence of the exponential power density being proportional to the Fourier transform of a stable distribution and of a Fubini inversion. (Incidentally, the processing times of Biometrika were not that impressive at the time, with a 2-page paper submitted in Dec. 1984 published in Sept. 1987!)

This is a very nice and general derivation, but I still miss the intuition as to why it happens that way. But then, I know nothing, and even less about products of random variates!