## expectation-propagation and ABC

“It seems quite absurd to reject an EP-based approach, if the only alternative is an ABC approach based on summary statistics, which introduces a bias which seems both larger (according to our numerical examples) and more arbitrary, in the sense that in real-world applications one has little intuition and even less mathematical guidance on to why p(θ|s(y)) should be close to p(θ|y) for a given set of summary statistics s.”

Simon Barthelmé and Nicolas Chopin posted a recent arXiv paper on Expectation-Propagation for Summary-Less, Likelihood-Free Inference. They sell expectation-propagation as quick and dirty version of ABC, avoiding the selection of summary statistics by using the constraint

$||y_i-y^\star_i||\le \epsilon$

on each component of the simulated pseudo-data vector y* being the actual data. Expectation-propagation is a variational technique [Simon and Nicolas are quite fond of!] and it consists in replacing the target with the “closest” member from an exponential family, like the Gaussian distribution. The expectation-propagation approximation is found by including a single “observation” at a time, using the other approximations for the prior, and finding the best Gaussian in this pseudo-model. In addition, expectation-propagation provides an approximation of the evidence. In the “likelihood-free” setting (I do not like this term because we are dealing with a specific well-defined likelihood, we simply cannot compute it!), this means computing empirical mean and empirical variance, one observation at a time, under the above tolerance constraint.

Unless I am confused, the expectation-propagation approximation to the  posterior distribution is a [sequentially updated] Gaussian distribution, which means that it will only be appropriate in cases where the posterior distribution is approximately Gaussian. Since the three examples processed in the paper are of this kind, e.g. the above reproduction, I wonder at the performances of the expectation-propagation method in less smooth cases, such as ridge-like or multimodal posteriors. The authors mention two limitations:  “First, it [EP] assumes a Gaussian prior; and second, it relies on a particular factorisation of the likelihood, which makes it possible to simulate sequentially the datapoints“, but those seem negligible wrt my above comment. I thus remain unconvinced by the concluding sentence quoted above. (The current approach to ABC is to consider p(θ|s(y)) as a target per se, not as an approximation to p(θ|y).) Nonetheless, expectation-propagation constitutes a quick approximation method that can always used as a reference against other approximations.

### 6 Responses to “expectation-propagation and ABC”

1. […] are X’s comments on a paper, “Expectation-Propagation for Likelihood-Free Inference,” by Simon Barthelme […]

2. [...] work in these direction. First, and more briefly briefly, I’ll present the ABC-EP algorithm (Chopin and Barthelmé, 2011). I’ll also discuss some possible future research in ABC theory. Second, I’ll discuss [...]

3. [...] universe, however I remain unconvinced by the universality of the target, as approximations such as EP and variational Bayes need to be introduced for the fast computation of the posterior distribution. [...]

4. Hi Christian,

Thanks for posting about this. There are two separate points that I’d like to address – one specific to our method, and the latter more generally on the philosophy of ABC approaches.

On ABC-EP, and just to elaborate on what Nicolas wrote: we are not claiming that ABC-EP is a silver bullet. It can work much, much better than ABC samplers on relatively well-behaved models, and it can fail miserably on troublesome problems. In our third example we have a large dataset, a 33-dimensional parameter space and a rather complex scientific model, and ABC-EP gives you something reasonable. It’d be a *lot* of work to get a regular ABC sampler to behave well in such a case.

Now there are many cases in which ABC-EP won’t work, including a) a bad model (acceptance prob. very small) b) a bad prior (same) and c) multimodality. All of these spell trouble for all ABC methods I know about.

Multi-modality is I think a real problem if we are going to apply ABC to real-world scientific models.

I think there are two kinds of multimodality. One of them is the kind you seen in mixture models and shows up as an artifact of parameterisation. Swap the labels of your mixtures and you get the same underlying object. Your posterior over the space of distributions has only one peak, it is the parameterisation that’s the issue. So in practice ignoring the other peaks (as in variational Bayes for mixtures) works well enough for prediction purposes.

The other kind of multimodality appears when doing statistical inference for scientific (not statistical) models. That’s when your model is expressive enough to include qualitatively different scenarios that could explain the data just as well. For example (I’m making this up), you have a dynamical model for economic growth and the same data could be explained by having a large effect for education and a small effect for health, or a large effect of health and a small one for education. In other words, you are trying to do model comparison through parameter inference. It’s asking too much of the method – the problem is scientific and not statistical.

My guess is that we’ll see a lot of that in applications of ABC, and I don’t think any of the methods will be any good at coping with the issue.

My even more general point relates to your comment that “The current approach to ABC is to consider p(θ|s(y)) as a target per se, not as an approximation to p(θ|y)”.

As far as I understand the philosophy behind considering $p(\theta|s(y))$ as a genuine target in itself is the idea that you only trust your model to tell you about the summary statistics s(y) rather than y itself. I’m not sure what to think about this.
I think it may be useful to contrast ABC to Generalised Method of Moments and Empirical Likelihood approaches.

In this cases you are also assuming that your model only tells you about some aspects of the data – for example, you could have a model that only expresses that higher education has a positive effect on mean GDP. However, crucially, you make no assumptions about p(s(y)|theta). What’s a bit strange about the ABC philosophy is that we are saying we trust the model to say something useful only about summary statistics, but we trust it enough to get the *distribution* of these sumary statistics right. If that’s the case then why couldn’t it get the distribution for the whole data right?

• Just to bounce on your “more general point”: there are on-going works relating ABC with both GMM and EL, some of which I am involved in, and this seems like a sound approach to me. We are using a partly defined model and build computational methods to deal with this; there is nothing truly Bayesian in the approach and it simply defines a new kind of inference. There are also perspectives on ABC that consider the whole model as given, for which ABC provides a low-information solution, due to the complexity of the model. It does not mean throwing away part of the model or making no assumption on $p(s(y)|\theta)$. On the opposite, $p(s(y)|\theta)$ is well-defined…

5. Nicolas Chopin Says:

Note that EP can use any exponential family as an approx class; multivariate Gaussian is just the most obvious example.
Yes, ABC-EP should not work well if posterior is multi-modal. (I wonder which ABC method would work well in such case.)
The plot you selected kind of answers your last point: the dashed line corresponds to MCMC-ABC, using the “best” sufficient stat that Peters et al. found (other were performing much worse). In our experiments, the bias introduced by summary stats is far larger than the EP bias.
Thanks for discussing our paper.