**A**s pointed out by Peter Coles on his blog, In the Dark, Hyungsuk Tak, Sujit Ghosh, and Justin Ellis just arXived a review of the unsafe use of improper priors in astronomy papers, 24 out of 75 having failed to establish that the corresponding posteriors are well-defined. And they exhibit such an instance (of impropriety) in a MNRAS paper by Pihajoki (2017), which is a complexification of Gelfand et al. (1990), also used by Jim Hobert in his thesis. (Even though the formal argument used to show the impropriety of the posterior in Pihajoki’s paper does not sound right since it considers divergence at a single value of a parameter β.) Besides repeating this warning about an issue that was rather quickly identified in the infancy of MCMC, if not in the very first publications on the Gibbs sampler, the paper seems to argue against using improper priors due to this potential danger, stating that instead proper priors that include all likely values and beyond are to be preferred. Which reminds me of the BUGS feature of using a N(0,10⁹) prior instead of the flat prior, missing the fact that “very large” variances do impact the resulting inference (if only for the issue of model comparison, remember Lindley-Jeffreys!). And are informative in that sense. However, it is obviously a good idea to advise checking for propriety (!) and using such alternatives may come as a safety button, providing a comparison benchmark to spot possible divergences in the resulting inference.

## Archive for astronomy

## improperties on an astronomical scale

Posted in Books, pictures, Statistics with tags astronomy, astrostatistics, Bayesian inference, BUGS, improper posteriors, impropriety, noninformative priors, vague priors on December 15, 2017 by xi'an## Astrostatistics school

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags ABC, ABC model choice, abcrf, abctools package, Alps, astronomy, Autrans, Bayesian inference, Bayesian Methods in Cosmology, big wall, cosmology, Dickey-Savage ratio, Fall, mountains, nested sampling, R, random forests, rock climbing, RStudio, socks, trail running, Vercors on October 17, 2017 by xi'an**W**hat a wonderful week at the Astrostat [Indian] summer school in Autrans! The setting was superb, on the high Vercors plateau overlooking both Grenoble [north] and Valence [west], with the colours of the Fall at their brightest on the foliage of the forests rising on both sides of the valley and a perfect green on the fields at the centre, with sun all along, sharp mornings and warm afternoons worthy of a late Indian summer, too many running trails [turning into X country ski trails in the Winter] to contemplate for a single week [even with three hours of running over two days], many climbing sites on the numerous chalk cliffs all around [but a single afternoon for that, more later in another post!]. And of course a group of participants eager to learn about Bayesian methodology and computational algorithms, from diverse [astronomy, cosmology and more] backgrounds, trainings and countries. I was surprised at the dedication of the participants travelling all the way from Chile, Péru, and Hong Kong for the sole purpose of attending the school. David van Dyk gave the first part of the school on Bayesian concepts and MCMC methods, Roberto Trotta the second part on Bayesian model choice and hierarchical models, and myself a third part on, surprise, surprise!, approximate Bayesian computation. Plus practicals on R.

As it happens Roberto had to cancel his participation and I turned for a session into Christian Roberto, presenting his slides in the most objective possible fashion!, as a significant part covered nested sampling and Savage-Dickey ratios, not exactly my favourites for estimating constants. David joked that he was considering postponing his flight to see me talk about these, but I hope I refrained from engaging into controversy and criticisms… If anything because this was not of interest for the participants. Indeed when I started presenting ABC through what I thought was a pedestrian example, namely Rasmus Baath’s socks, I found that the main concern was not running an MCMC sampler or a substitute ABC algorithm but rather an healthy questioning of the construction of the informative prior in that artificial setting, which made me quite glad I had planned to cover this example rather than an advanced model [as, e.g., one of those covered in the packages abc, abctools, or abcrf]. Because it generated those questions about the prior [why a Negative Binomial? why these hyperparameters? &tc.] and showed how programming ABC turned into a difficult exercise even in this toy setting. And while I wanted to give my usual warning about ABC model choice and argue for random forests as a summary selection tool, I feel I should have focussed instead on another example, as this exercise brings out so clearly the conceptual difficulties with what is taught. Making me quite sorry I had to leave one day earlier. [As did missing an extra run!] Coming back by train through the sunny and grape-covered slopes of Burgundy hills was an extra reward [and no one in the train commented about the local cheese travelling in my bag!]

## it’s only an eclipse, for heavens sake!

Posted in Statistics with tags astronomy, eclipse, rationalism, religions, secularism, solar system on September 2, 2017 by xi'an**I** have been amazed and utterly baffled by the number of commentaries about the solar eclipse of last week that involved metaphysics and religious aspects. An eclipse is a most natural [and beautiful] phenomenon of one astronomical object getting in front of another one in a very predictable way: no reason to invoke deities or spirits in the process!

## Bye, Rosetta!

Posted in pictures, Travel with tags 67P/Churyumov–Gerasimenko, astronomy, comets, European Space Agency, Philae lander, Rosetta, space probe on September 30, 2016 by xi'an## the curious incident of the inverse of the mean

Posted in R, Statistics, University life with tags astronomy, Bayesian inference, inverse problems, parallaxes on July 15, 2016 by xi'an**A** s I figured out while working with astronomer colleagues last week, a strange if understandable difficulty proceeds from the simplest and most studied statistical model, namely the Normal model

x~N(θ,1)

Indeed, if one reparametrises this model as x~N(υ⁻¹,1) with υ>0, a *single* observation x brings very little information about υ! (This is not a toy problem as it corresponds to estimating distances from observations of parallaxes.) If x gets large, υ is very likely to be small, but if x is small or negative, υ is certainly large, with no power to discriminate between highly different values. For instance, Fisher’s information for this model and parametrisation is υ⁻² and thus collapses at zero.

While one can always hope for Bayesian miracles, they do not automatically occur. For instance, working with a Gamma prior Ga(3,10³) on υ [as informed by a large astronomy dataset] leads to a posterior expectation hardly impacted by the value of the observation x:

And using an alternative estimate like the harmonic posterior mean that is associated with the relative squared error loss does not see much more impact from the observation:

There is simply not enough information contained in one datapoint (or even several datapoints for all that matters) to infer about υ.

## ABC and cosmology

Posted in Books, pictures, Statistics, University life with tags ABC, ABC-PMC, abcpmc, astronomy, astrostatistics, cosmoabc, cosmology, likelihood-free methods, Mahalanobis distance, Python, semi-automatic ABC on May 4, 2015 by xi'an**T**wo papers appeared on arXiv in the past two days with the similar theme of applying ABC-PMC [one version of which we developed with Mark Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin in 2009] to cosmological problems. (As a further coincidence, I had just started refereeing yet another paper on ABC-PMC in another astronomy problem!) The first paper cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation by Ishida et al. [“et al” including Ewan Cameron] proposes a Python ABC-PMC sampler with applications to galaxy clusters catalogues. The paper is primarily a description of the cosmoabc package, including code snapshots. Earlier occurrences of ABC in cosmology are found for instance in this earlier workshop, as well as in Cameron and Pettitt earlier paper. The package offers a way to evaluate the impact of a specific distance, with a 2D-graph demonstrating that the minimum [if not the range] of the simulated distances increases with the parameters getting away from the best parameter values.

“We emphasis[sic]that the choice of the distance function is a crucial step in the design of the ABC algorithm and the reader must check its properties carefully before any ABC implementation is attempted.”E.E.O. Ishida et al.

The second [by one day] paper Approximate Bayesian computation for forward modelling in cosmology by Akeret et al. also proposes a Python ABC-PMC sampler, abcpmc. With fairly similar explanations: maybe both samplers should be compared on a reference dataset. While I first thought the description of the algorithm was rather close to our version, including the choice of the empirical covariance matrix with the factor 2, it appears it is adapted from a tutorial in the Journal of Mathematical Psychology by Turner and van Zandt. One out of many tutorials and surveys on the ABC method, of which I was unaware, but which summarises the pre-2012 developments rather nicely. Except for missing Paul Fearnhead’s and Dennis Prangle’s semi-automatic Read Paper. In the abcpmc paper, the update of the covariance matrix is the one proposed by Sarah Filippi and co-authors, which includes an extra bias term for faraway particles.

“For complex data, it can be difficult or computationally expensive to calculate the distance ρ(x; y) using all the information available in x and y.”Akeret et al.

In both papers, the role of the distance is stressed as being quite important. However, the cosmoabc paper uses an L1 distance [see (2) therein] in a toy example without normalising between mean and variance, while the abcpmc paper suggests using a Mahalanobis distance that turns the d-dimensional problem into a comparison of one-dimensional projections.