## ABC and cosmology

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on May 4, 2015 by xi'an

Two papers appeared on arXiv in the past two days with the similar theme of applying ABC-PMC [one version of which we developed with Mark Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin in 2009] to cosmological problems. (As a further coincidence, I had just started refereeing yet another paper on ABC-PMC in another astronomy problem!) The first paper cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation by Ishida et al. [“et al” including Ewan Cameron] proposes a Python ABC-PMC sampler with applications to galaxy clusters catalogues. The paper is primarily a description of the cosmoabc package, including code snapshots. Earlier occurrences of ABC in cosmology are found for instance in this earlier workshop, as well as in Cameron and Pettitt earlier paper. The package offers a way to evaluate the impact of a specific distance, with a 2D-graph demonstrating that the minimum [if not the range] of the simulated distances increases with the parameters getting away from the best parameter values.

“We emphasis [sic] that the choice of the distance function is a crucial step in the design of the ABC algorithm and the reader must check its properties carefully before any ABC implementation is attempted.” E.E.O. Ishida et al.

The second [by one day] paper Approximate Bayesian computation for forward modelling in cosmology by Akeret et al. also proposes a Python ABC-PMC sampler, abcpmc. With fairly similar explanations: maybe both samplers should be compared on a reference dataset. While I first thought the description of the algorithm was rather close to our version, including the choice of the empirical covariance matrix with the factor 2, it appears it is adapted from a tutorial in the Journal of Mathematical Psychology by Turner and van Zandt. One out of many tutorials and surveys on the ABC method, of which I was unaware, but which summarises the pre-2012 developments rather nicely. Except for missing Paul Fearnhead’s and Dennis Prangle’s semi-automatic Read Paper. In the abcpmc paper, the update of the covariance matrix is the one proposed by Sarah Filippi and co-authors, which includes an extra bias term for faraway particles.

“For complex data, it can be difficult or computationally expensive to calculate the distance ρ(x; y) using all the information available in x and y.” Akeret et al.

In both papers, the role of the distance is stressed as being quite important. However, the cosmoabc paper uses an L1 distance [see (2) therein] in a toy example without normalising between mean and variance, while the abcpmc paper suggests using a Mahalanobis distance that turns the d-dimensional problem into a comparison of one-dimensional projections.

## summary statistics for ABC model choice

Posted in Statistics with tags , , , , , , , , , on March 11, 2013 by xi'an

A few days ago, Dennis Prangle, Paul Fernhead, and their co-authors from New Zealand have posted on arXiv their (long-awaited) study of the selection of summary statistics for ABC model choice. And I read it during my trip to England, in trains and planes, if not when strolling in the beautiful English countryside as above.

As posted several times on this ‘Og, the crux of the analysis is that the Bayes factor is a good type of summary when comparing two models, this result extending to more model by considering instead the vector of evidences. As in the initial Read Paper by Fearnhead and Prangle, there is no true optimality in using the Bayes factor or vector of evidences, strictly speaking, besides the fact that the vector of evidences is minimal sufficient for the marginal models (integrating out the parameters). (This was a point made in my discussion.) The implementation of the principle is similar to this Read Paper setting as well: run a pilot ABC simulation, estimate the vector of evidences, and re-run the main ABC simulation using this estimate as the summary statistic. The paper contains a simulation study using some of our examples (in Marin et al., 2012), as well as an application to genetic bacterial data. Continue reading

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on June 5, 2012 by xi'an

## ABC for missing data

Posted in Statistics, University life with tags , , , , , , on February 2, 2012 by xi'an

I received this email a few days ago:

I am an hard-core reader of your blog and thanks to your posts I have recently started being interested to ABC (and Bayesian methods in general). I am writing you to ask for suggestions on the application of the semi-automatic ABC à la Fearnhead & Prangle. The reason why I am contacting you instead of addressing the authors is because (i) you have been involved in the RSS reading of their paper and (ii) you are an authority on ABC, and therefore you are probably best suited and less biased on such issue. I am applying ABC with the semi-automatic statistics selection provided in Fearnhead and Prangle (2012) to a problem which can be formalized as a hidden Markov model. However I am unsure of whether I am making a huge mistake on the following point: let’s suppose we have an unobserved (latent) system state X (depending on an unknown parameter θ) and a corresponding “corrupted” version which is observed with some measurement error, e.g.

Y = X + ε,

where ε is the measurement error independent of X, ε is N(0, σ²), say. Now their setup does not consider measurement error, so I wonder if the following is correct. Since I can simulate n times some X’ from p(X|θ) am I allowed to use the corresponding “simulated” n corrupted measurements

Y’ = X’ + ε’

(where ε’ is a draw from p(ε|σ)) into their regression approach to determine a (vector of) summary statistic S=(S1,S2) for (θ,σ)? I mean the Y’ are draws from a p(y|X’,θ,σ) which is conditional on X’. Is this allowed? Wilkinson (2008) is the only reference I have found considering ABC with measurement-error (the ones by Dean et al (2011) and Jasra et al (2011) being too technical in my opinion to allow a practical implementation) and since he does not consider a summary statistics-based approach (e.g. Algorithm D, page 10) of course he is not in need to simulate the corrupted measurements but only the latent ones. Therefore I am a bit unsure on whether it is statistically ok to simulate Y’ conditionally on X’ or if there is some theoretical issue that makes this nonsense.

to which I replied

…about your model and question, there is no theoretical difficulty in simulating x’ then y’given x’ and a value of the parameters. The reason is that

$y' \sim \int f(y',x'|\theta) \text{d}x' = f(y'|\theta)$

.the proper marginal as defined by the model. Using the intermediate x’ is a way to bypass the integral but this is 100% correct!…

a reply followed by a further request for precision

Although your equation is clearly true, I am not sure I fully grasp the issue, so I am asking for confirmation. Yes, as you noticed I need a

y’ ~ f(y’|θ,σ)

Now it’s certainly true that I can generate a draw x’ from f(x’|θ,σ) and then plug such x’ into f(y’|x’,θ,σ) to generate y’. By proceeding this way I obtain a draw (y’,x’) from f(y’,x’|θ,σ). I think I understood your reasoning, on how by using the procedure above I am actually skipping the computation of the integral in:

$\int f(y',x'|\theta, \sigma) \text{d}x'.$

Is it basically the case that the mechanism above is just a classic simulation from a bivariate distribution, where since I am interested in the marginal f(y’|θ,σ) I simulate from the joint density f(y’,x’|θ,σ) and then discard the x’ output?

which is indeed a correct interpretation. When simulating from a joint, the marginals are by-products of the simulation.