**A**s pointed out by Peter Coles on his blog, In the Dark, Hyungsuk Tak, Sujit Ghosh, and Justin Ellis just arXived a review of the unsafe use of improper priors in astronomy papers, 24 out of 75 having failed to establish that the corresponding posteriors are well-defined. And they exhibit such an instance (of impropriety) in a MNRAS paper by Pihajoki (2017), which is a complexification of Gelfand et al. (1990), also used by Jim Hobert in his thesis. (Even though the formal argument used to show the impropriety of the posterior in Pihajoki’s paper does not sound right since it considers divergence at a single value of a parameter β.) Besides repeating this warning about an issue that was rather quickly identified in the infancy of MCMC, if not in the very first publications on the Gibbs sampler, the paper seems to argue against using improper priors due to this potential danger, stating that instead proper priors that include all likely values and beyond are to be preferred. Which reminds me of the BUGS feature of using a N(0,10⁹) prior instead of the flat prior, missing the fact that “very large” variances do impact the resulting inference (if only for the issue of model comparison, remember Lindley-Jeffreys!). And are informative in that sense. However, it is obviously a good idea to advise checking for propriety (!) and using such alternatives may come as a safety button, providing a comparison benchmark to spot possible divergences in the resulting inference.

## Archive for astrostatistics

## improperties on an astronomical scale

Posted in Books, pictures, Statistics with tags astronomy, astrostatistics, Bayesian inference, BUGS, improper posteriors, impropriety, noninformative priors, vague priors on December 15, 2017 by xi'an## [Astrostat summer school] fogrise [jatp]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags astrostatistics, Autrans, fog, inversion, MCMC algorithms, Monte Carlo Statistical Methods, ski jump, summer school, sunrise, Vercors on October 11, 2017 by xi'an## [Astrostat summer school] sunrise [jatp]

Posted in Statistics with tags astrostatistics, Autrans, Belledone, Fall, fog, French Alps, Grande Chartreuse, Grenoble, jatp, summer school, sunrise, Vercors on October 10, 2017 by xi'an## [summer Astrostat school] room with a view [jatp]

Posted in Mountains, pictures, R, Running, Statistics, Travel, University life with tags astrostatistics, Aussois, Banff, Bayesian statistics, France, Grenoble, Les Diablerets, R, RStudio, summer school, Vercors on October 9, 2017 by xi'an**I** just arrived in Autrans, on the Plateau du Vercors overlooking Grenoble and the view is fabulistic! Trees have started to turn red and yellow, the weather is very mild, and my duties are restricted to teaching ABC to a group of enthusiastic astronomers and cosmologists..! Second advanced course on ABC in the mountains this year, hard to beat (except by a third course). The surroundings are so serene and peaceful that I even conceded to install RStudio for my course! Instead of sticking to my favourite vim editor and line commands.

## Bayesian methods in cosmology

Posted in Statistics with tags ABC, astrostatistics, Autrans, Bayesian Methods in Cosmology, Grenoble, Les Diablerets, nested sampling, Ockham's razor, Pierre Simon de Laplace, Roberto Trotta, Saas Fee, Switzerland, Vercors on January 18, 2017 by xi'an**A** rather massive document was arXived a few days ago by Roberto Trotta on Bayesian methods for cosmology, in conjunction with an earlier winter school, the 44th Saas Fee Advanced Course on Astronomy and Astrophysics, “Cosmology with wide-field surveys”. While I never had the opportunity to give a winter school in Saas Fee, I will give next month a course on ABC to statistics graduates in another Swiss dream location, Les Diablerets. And next Fall a course on ABC again but to astronomers and cosmologists, in Autrans, near Grenoble.

The course document is an 80 pages introduction to probability and statistics, in particular Bayesian inference and Bayesian model choice. Including exercises and references. As such, it is rather standard in that the material could be found as well in textbooks. Statistics textbooks.

When introducing the Bayesian perspective, Roberto Trotta advances several arguments in favour of this approach. The first one is that it is generally easier to follow a Bayesian approach when compared with seeking a non-Bayesian one, while recovering long-term properties. (Although there are inconsistent Bayesian settings.) The second one is that a Bayesian modelling allows to handle naturally nuisance parameters, because there are essentially no nuisance parameters. (Even though preventing small world modelling may lead to difficulties as in the Robbins-Wasserman paradox.) The following two reasons are the incorporation of prior information and the appeal on conditioning on the actual data.

The document also includes this above and nice illustration of the concentration of measure as the dimension of the parameter increases. (Although one should not over-interpret it. The concentration does not occur in the same way for a normal distribution for instance.) It further spends quite some space on the Bayes factor, its scaling as a natural Occam’s razor, and the comparison with p-values, before (unsurprisingly) introducing nested sampling. And the Savage-Dickey ratio. The conclusion of this model choice section proposes some open problems, with a rather unorthodox—in the Bayesian sense—line on the justification of priors and the notion of a “correct” prior (yeech!), plus an musing about adopting a loss function, with which I quite agree.

## Bayesian astrostats under Laplace’s gaze

Posted in Books, Kids, pictures, Statistics, Travel, University life, Wines with tags Arago, astrostatistics, Greenwich Meridian, jazz, Le Verrier, Louis XIV, Observatoire de Paris, Paris, Paris Meridian, Perrault, Pierre Simon de Laplace, thesis defence on October 11, 2016 by xi'an**T**his afternoon, I was part of a jury of an astrostatistics thesis, where the astronomy part was about binary objects in the Solar System, and the statistics part about detecting patterns in those objects, unsurprisingly. The first part was highly classical using several non-parametric tests like Kolmogorov-Smirnov to test whether those binary objects were different from single objects. While the p-values were very tiny, I felt these values were over-interpreted in the thesis, because the sample size of N=30 leads to some scepticism about numerical quantities like 0.0008. While I do not want to sound pushing for Bayesian solutions in every setting, this case is a good illustration of the nefarious power of p-values, which are almost always taken at face value, i.e., where 0.008 is understood in terms of the null hypothesis and not in terms of the observed realisation of the p-value. Even within a frequentist framework, the distribution of this p-value should be evaluated or estimated one way or another, as there is no reason to believe it is anywhere near a Uniform(0,1) distribution.The second part of the thesis was about the estimation of some parameters of the laws of the orbits of those dual objects and the point of interest for me was the purely mechanical construction of a likelihood function that was an exponential transform of a sum of residuals, made of squared differences between the observations and their expectations. Or a power of such differences. This was called the “statistical model” in the thesis and I presume in part of the astrostats literature. This reminded me of the first meeting I had with my colleagues from Besançon, where they could not use such mechanical versions because of intractable expectations and used instead simulations from their physical model, literally reinventing ABC. This resolution had the same feeling, closer to indirect inference than regular inference, although it took me half the defence to realise it.

The defence actually took part in the beautiful historical Perrault’s building of Observatoire de Paris, in downtown Paris, where Cassini, Arago and Le Verrier once ruled! In the council room under paintings of major French astronomers, including Laplace himself, looking quite smug in his academician costume. The building is built around the Paris Zero Meridian (which got dethroned in 1911 by the Greenwich Zero Meridian, which I contemplated as a kid since my childhood church had the Greenwich drawn on the nave stones). The customary “pot” after the thesis and its validation by the jury was in the less historical cafeteria of the Observatoire, but it included a jazz big band, which made this thesis defence quite unique in many ways!

## Savage-Dickey supermodels

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags astrostatistics, Bayes factor, Biometrika, Brad Carlin, bridge sampling, cosmology, encompassing model, MCMC, mixtures of distributions, nested sampling, Péru, Sid Chib on September 13, 2016 by xi'an**A**. Mootoovaloo, B. Bassett, and M. Kunz just arXived a paper on the computation of Bayes factors by the Savage-Dickey representation through a supermodel (or encompassing model). (I wonder why Savage-Dickey is so popular in astronomy and cosmology statistical papers and not so much elsewhere.) Recall that the trick is to write the Bayes factor in favour of the encompasssing model as the ratio of the posterior and of the prior for the tested parameter (thus eliminating nuisance or common parameters) at its null value,

B^{10}=π(φ⁰|x)/π(φ⁰).

Modulo some continuity constraints on the prior density, and the assumption that the conditional prior on nuisance parameter is the same under the null model and the encompassing model [given the null value φ⁰]. If this sounds confusing or even shocking from a mathematical perspective, check the numerous previous entries on this topic on the ‘Og!

The supermodel created by the authors is a mixture of the original models, as in our paper, and… *hold the presses!*, it is a mixture of the likelihood functions, as in Phil O’Neill’s and Theodore Kypraios’ paper. Which is not mentioned in the current paper and should obviously be. In the current representation, the posterior distribution on the mixture weight α is a linear function of α involving both evidences, α(m¹-m²)+m², times the artificial prior on α. The resulting estimator of the Bayes factor thus shares features with bridge sampling, reversible jump, and the importance sampling version of nested sampling we developed in our Biometrika paper. In addition to O’Neill and Kypraios’s solution.

The following quote is inaccurate since the MCMC algorithm needs simulating the parameters of the compared models in realistic settings, hence representing the multidimensional integrals by Monte Carlo versions.

“Though we have a clever way of avoiding multidimensional integrals to calculate the Bayesian Evidence, this new method requires very efficient sampling and for a small number of dimensions is not faster than individual nested sampling runs.”

I actually wonder at the sheer rationale of running an intensive MCMC sampler in such a setting, when the weight α is completely artificial. It is only used to jump from one model to the next, which sound quite inefficient when compared with simulating from both models separately and independently. This approach can also be seen as a special case of Carlin’s and Chib’s (1995) alternative to reversible jump. Using instead the Savage-Dickey representation is of course infeasible. Which makes the overall reference to this method rather inappropriate in my opinion. Further, the examples processed in the paper all involve (natural) embedded models where the original Savage-Dickey approach applies. Creating an additional model to apply a pseudo-Savage-Dickey representation does not sound very compelling…

Incidentally, the paper also includes a discussion of a weird notion, the likelihood of the Bayes factor, B¹², which is plotted as a distribution in B¹², most strangely. The only other place I met this notion is in Murray Aitkin’s book. Something’s unclear there or in my head!

“One of the fundamental choices when using the supermodel approach is how to deal with common parameters to the two models.”

This is an interesting question, although maybe not so relevant for the Bayes factor issue where it should not matter. However, as in our paper, multiplying the number of parameters in the encompassing model may hinder convergence of the MCMC chain or reduce the precision of the approximation of the Bayes factor. Again, from a Bayes factor perspective, this does not matter [while it does in our perspective].