Archive for cosmology

Savage-Dickey supermodels

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on September 13, 2016 by xi'an

The Wider Image: Bolivia's cholita climbers: Combination picture shows Aymara indigenous women (L-R) Domitila Alana, 42, Bertha Vedia, 48, Lidia Huayllas, 48, and Dora Magueno, 50, posing for a photograph at the Huayna Potosi mountain, Bolivia April 6, 2016Combination picture shows Aymara indigenous women (L-R) Domitila Alana, 42, Bertha Vedia, 48, Lidia Huayllas, 48, and Dora Magueno, 50, posing for a photograph at the Huayna Potosi mountain, Bolivia April 6, 2016. (c.) REUTERS/David Mercado. REUTERS/David MercadoA. Mootoovaloo, B. Bassett, and M. Kunz just arXived a paper on the computation of Bayes factors by the Savage-Dickey representation through a supermodel (or encompassing model). (I wonder why Savage-Dickey is so popular in astronomy and cosmology statistical papers and not so much elsewhere.) Recall that the trick is to write the Bayes factor in favour of the encompasssing model as the ratio of the posterior and of the prior for the tested parameter (thus eliminating nuisance or common parameters) at its null value,


Modulo some continuity constraints on the prior density, and the assumption that the conditional prior on nuisance parameter is the same under the null model and the encompassing model [given the null value φ⁰]. If this sounds confusing or even shocking from a mathematical perspective, check the numerous previous entries on this topic on the ‘Og!

The supermodel created by the authors is a mixture of the original models, as in our paper, and… hold the presses!, it is a mixture of the likelihood functions, as in Phil O’Neill’s and Theodore Kypraios’ paper. Which is not mentioned in the current paper and should obviously be. In the current representation, the posterior distribution on the mixture weight α is a linear function of α involving both evidences, α(m¹-m²)+m², times the artificial prior on α. The resulting estimator of the Bayes factor thus shares features with bridge sampling, reversible jump, and the importance sampling version of nested sampling we developed in our Biometrika paper. In addition to O’Neill and Kypraios’s solution.

The following quote is inaccurate since the MCMC algorithm needs simulating the parameters of the compared models in realistic settings, hence representing the multidimensional integrals by Monte Carlo versions.

“Though we have a clever way of avoiding multidimensional integrals to calculate the Bayesian Evidence, this new method requires very efficient sampling and for a small number of dimensions is not faster than individual nested sampling runs.”

I actually wonder at the sheer rationale of running an intensive MCMC sampler in such a setting, when the weight α is completely artificial. It is only used to jump from one model to the next, which sound quite inefficient when compared with simulating from both models separately and independently. This approach can also be seen as a special case of Carlin’s and Chib’s (1995) alternative to reversible jump. Using instead the Savage-Dickey representation is of course infeasible. Which makes the overall reference to this method rather inappropriate in my opinion. Further, the examples processed in the paper all involve (natural) embedded models where the original Savage-Dickey approach applies. Creating an additional model to apply a pseudo-Savage-Dickey representation does not sound very compelling…

Incidentally, the paper also includes a discussion of a weird notion, the likelihood of the Bayes factor, B¹², which is plotted as a distribution in B¹², most strangely. The only other place I met this notion is in Murray Aitkin’s book. Something’s unclear there or in my head!

“One of the fundamental choices when using the supermodel approach is how to deal with common parameters to the two models.”

This is an interesting question, although maybe not so relevant for the Bayes factor issue where it should not matter. However, as in our paper, multiplying the number of parameters in the encompassing model may hinder convergence of the MCMC chain or reduce the precision of the approximation of the Bayes factor. Again, from a Bayes factor perspective, this does not matter [while it does in our perspective].

astroABC: ABC SMC sampler for cosmological parameter estimation

Posted in Books, R, Statistics, University life with tags , , , , , , , , on September 6, 2016 by xi'an

“…the chosen statistic needs to be a so-called sufficient statistic in that any information about the parameter of interest which is contained in the data, is also contained in the summary statistic.”

Elise Jenningsa and Maeve Madigan arXived a paper on a new Python code they developed for implementing ABC-SMC, towards astronomy or rather cosmology applications. They stress the parallelisation abilities of their approach which leads to “crucial speed enhancement” against the available competitors, abcpmc and cosmoabc. The version of ABC implemented there is “our” ABC PMC where particle clouds are shifted according to mixtures of random walks, based on each and every point of the current cloud, with a scale equal to twice the estimated posterior variance. (The paper curiously refers to non-astronomy papers through their arXiv version, even when they have been published. Like our 2008 Biometrika paper.) A large part of the paper is dedicated to computing aspects that escape me, like the constant reference to MPIs. The algorithm is partly automated, except for the choice of the summary statistics and of the distance. The tolerance is chosen as a (large) quantile of the previous set of simulated distances. Getting comments from the designers of abcpmc and cosmoabc would be great.

“It is clear that the simple Gaussian Likelihood assumption in this case, which neglects the effects of systematics yields biased cosmological constraints.”

The last part of the paper compares ABC and MCMC on a supernova simulated dataset. Which is somewhat a dubious comparison since the model used for producing the data and running ABC is not the same as the Gaussian version used with MCMC. Unsurprisingly, MCMC then misses the true value of the cosmological parameters and most likely and more importantly the true posterior HPD region. While ABC SMC (or PMC) proceeds to a concentration around the genuine parameter values. (There is no additional demonstration of how accelerated the approach is.)

Bayesian model averaging in astrophysics [guest post]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on August 12, 2015 by xi'an

.[Following my posting of a misfiled 2013 blog, Ewan Cameron told me of the impact of this paper in starting his own blog and I asked him for a guest post, resulting in this analysis, much deeper than mine. No warning necessary this time!]

Back in February 2013 when “Bayesian Model Averaging in Astrophysics: A Review” by Parkinson & Liddle (hereafter PL13) first appeared on the arXiv I was a keen, young(ish) postdoc eager to get stuck into debates about anything and everything ‘astro-statistical’. And with its seemingly glaring flaws, PL13 was more grist to the mill. However, despite my best efforts on various forums I couldn’t get a decent fight started over the right way to do model averaging (BMA) in astronomy, so out of sheer frustration two months later I made my own soapbox to shout from at Another Astrostatistics Blog. Having seen PL13 reviewed recently here on Xian’s Og it feels like the right time to revisit the subject and reflect on where BMA in astronomy is today.

As pointed out to me back in 2013 by Tom Loredo, the act of Bayesian model averaging has been around much longer than its name; indeed an early astronomical example appears in Gregory & Loredo (1992) in which the posterior mean representation of an unknown signal is constructed for an astronomical “light-curve”, averaging over a set of constant and periodic candidate models. Nevertheless the wider popularisation of model averaging in astronomy has only recently taken place through a variety of applications in cosmology: e.g. Liddle, Mukherjee, Parkinson & Wang (2006) and Vardanyan, Trotta & Silk (2011).

In contrast to earlier studies like Gregory & Loredo (1992)—or the classic review on BMA by Hoeting et al. (1999)—in which the target of model averaging is typically either a utility function, a set of future observations, or a latent parameter of the observational process (e.g. the unknown “light-curve” shape) shared naturally by all competing models, the proposal of cosmological BMA studies is to produce a model-averaged version of the posterior for a given ‘shared’ parameter: a so-called “model-averaged PDF”. This proposal didn’t sit well with me back in 2013, and it still doesn’t sit well with me today. Philosophically: without a model a parameter has no meaning, so why should we want to seek meaning in the marginalised distribution of a parameter over an entire set of models? And, practically: to put it another way, without knowing the model ‘label’ to which a given mass of model-averaged parameter probability belongs there’s nothing much useful we can do with this ‘PDF’: nothing much we can say about the data we’ve just analysed and nothing much we can say about future experiments. Whereas the space of the observed data is shared automatically by all competing models, it seems to me to be somehow “un-Bayesian” to place the further restriction that the parameters of separate models share the same scale and topology. I say “un-Bayesian” since this mode of model averaging suggests a formulation of the parameter space + prior pairing stronger than the statement of one’s prior beliefs for the distribution of observable data given the model. But I would be happy to hear arguments from the other side in the comments box below … ! Continue reading

ABC and cosmology

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on May 4, 2015 by xi'an

Two papers appeared on arXiv in the past two days with the similar theme of applying ABC-PMC [one version of which we developed with Mark Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin in 2009] to cosmological problems. (As a further coincidence, I had just started refereeing yet another paper on ABC-PMC in another astronomy problem!) The first paper cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation by Ishida et al. [“et al” including Ewan Cameron] proposes a Python ABC-PMC sampler with applications to galaxy clusters catalogues. The paper is primarily a description of the cosmoabc package, including code snapshots. Earlier occurrences of ABC in cosmology are found for instance in this earlier workshop, as well as in Cameron and Pettitt earlier paper. The package offers a way to evaluate the impact of a specific distance, with a 2D-graph demonstrating that the minimum [if not the range] of the simulated distances increases with the parameters getting away from the best parameter values.

“We emphasis [sic] that the choice of the distance function is a crucial step in the design of the ABC algorithm and the reader must check its properties carefully before any ABC implementation is attempted.” E.E.O. Ishida et al.

The second [by one day] paper Approximate Bayesian computation for forward modelling in cosmology by Akeret et al. also proposes a Python ABC-PMC sampler, abcpmc. With fairly similar explanations: maybe both samplers should be compared on a reference dataset. While I first thought the description of the algorithm was rather close to our version, including the choice of the empirical covariance matrix with the factor 2, it appears it is adapted from a tutorial in the Journal of Mathematical Psychology by Turner and van Zandt. One out of many tutorials and surveys on the ABC method, of which I was unaware, but which summarises the pre-2012 developments rather nicely. Except for missing Paul Fearnhead’s and Dennis Prangle’s semi-automatic Read Paper. In the abcpmc paper, the update of the covariance matrix is the one proposed by Sarah Filippi and co-authors, which includes an extra bias term for faraway particles.

“For complex data, it can be difficult or computationally expensive to calculate the distance ρ(x; y) using all the information available in x and y.” Akeret et al.

In both papers, the role of the distance is stressed as being quite important. However, the cosmoabc paper uses an L1 distance [see (2) therein] in a toy example without normalising between mean and variance, while the abcpmc paper suggests using a Mahalanobis distance that turns the d-dimensional problem into a comparison of one-dimensional projections.

the latest Significance: Astrostats, black swans, and pregnant drivers [and zombies]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on February 4, 2015 by xi'an

Reading Significance is always an enjoyable moment, when I can find time to skim through the articles (before my wife gets hold of it!). This time, I lost my copy between my office and home, and borrowed it from Tom Nichols at Warwick with four mornings to read it during breakfast. This December issue is definitely interesting, as it contains several introduction articles on astro- and cosmo-statistics! One thing I had not noticed before is how a large fraction of the papers is written by authors of books, giving a quick entry or interview about their book. For instance, I found out that Roberto Trotta had written a general public book called the Edge of the Sky (All You Need to Know About the All-There-Is) which exposes the fundamentals of cosmology through the 1000 most common words in the English Language.. So Universe is replaced with All-There-Is! I can understand and to some extent applaud the intention, but it nonetheless makes for a painful read, judging from the excerpt, when researcher and telescope are not part of the accepted vocabulary. Reading the corresponding article in Significance let me a bit bemused at the reason provided for the existence of a multiverse, i.e., of multiple replicas of our universe, all with different conditions: multiplying the universes makes our more likely, while it sounds almost impossible on its own! This sounds like a very frequentist argument… and I am not even certain it would convince a frequentist. The other articles in this special astrostatistics section were of a more statistical nature, from estimating the number of galaxies to the chances of a big asteroid impact. Even though I found the graphical representation of the meteorite impacts in the past century because of the impact drawing in the background. However, when I checked the link to Carlo Zapponi’s website, I found the picture was a still of a neat animation of meteorites falling since the first report.

Continue reading

the intelligent-life lottery

Posted in Books, Kids with tags , , , , , , , on August 24, 2014 by xi'an

monkey at Amber FortIn a theme connected with one argument in Dawkins’ The God Delusion, The New York Time just published a piece on the 20th anniversary of the debate between Carl Sagan and Ernst Mayr about the likelihood of the apparition of intelligent life. While 20 years ago, there was very little evidence if any of the existence of Earth-like planets, the current estimate is about 40 billions… The argument against the high likelihood of other inhabited planets is that the appearance of life on Earth is an accumulation of unlikely events. This is where the paper goes off-road and into the ditch, in my opinion, as it makes the comparison of the emergence of intelligent (at the level of human) life to be “as likely as if a Powerball winner kept buying tickets and — round after round — hit a bigger jackpot each time”. The later having a very clearly defined probability of occurring. Since “the chance of winning the grand prize is about one in 175 million”. The paper does not tell where the assessment of this probability can be found for the emergence of human life and I very much doubt it can be justified. Given the myriad of different species found throughout the history of evolution on Earth, some of which evolved and many more which vanished, I indeed find it hard to believe that evolution towards higher intelligence is the result of a basically zero probability event. As to conceive that similar levels of intelligence do exist on other planets, it also seems more likely than not that life took on average the same span to appear and to evolve and thus that other inhabited planets are equally missing means to communicate across galaxies. Or that the signals they managed to send earlier than us have yet to reach us. Or Earth a long time after the last form of intelligent life will have vanished…

modern cosmology as a refutation of theism

Posted in Books with tags , , , , , on June 23, 2014 by xi'an

Central Park, New York, Sep. 25, 2011

While I thought the series run by The Stone on the philosophy [or lack thereof] of religions was over, it seems there are more entries.  This week, I read with great pleasure the piece written by Tim Maudlin on the role played by recent results in (scientific) cosmology in refuting theist arguments.

“No one looking at the vast extent of the universe and the completely random location of homo sapiens within it (in both space and time) could seriously maintain that the whole thing was intentionally created for us.” T. Maudlin

What I particularly liked in his arguments is the role played by randomness, with an accumulation of evidence of the random nature and location of Earth and human beings, which and who appear more and more at the margins of the Universe rather than the main reason for its existence. And his clear rejection of the argument of fine-tuned cosmological constants as an argument in favour of the existence of a watchmaker. (Argument that was also deconstructed in Seber’s book.) And obviously his final paragraph that “Atheism is the default position in any scientific inquiry”. This may be the strongest entry in the whole series.