Just to bounce on your “more general point”: there are on-going works relating ABC with both GMM and EL, some of which I am involved in, and this seems like a sound approach to me. We are using a partly defined model and build computational methods to deal with this; there is nothing truly Bayesian in the approach and it simply defines a new kind of inference. There are also perspectives on ABC that consider the whole model as given, for which ABC provides a low-information solution, due to the complexity of the model. It does not mean throwing away part of the model or making no assumption on . On the opposite, is well-defined…

]]>Thanks for posting about this. There are two separate points that I’d like to address – one specific to our method, and the latter more generally on the philosophy of ABC approaches.

On ABC-EP, and just to elaborate on what Nicolas wrote: we are not claiming that ABC-EP is a silver bullet. It can work much, much better than ABC samplers on relatively well-behaved models, and it can fail miserably on troublesome problems. In our third example we have a large dataset, a 33-dimensional parameter space and a rather complex scientific model, and ABC-EP gives you something reasonable. It’d be a *lot* of work to get a regular ABC sampler to behave well in such a case.

Now there are many cases in which ABC-EP won’t work, including a) a bad model (acceptance prob. very small) b) a bad prior (same) and c) multimodality. All of these spell trouble for all ABC methods I know about.

Multi-modality is I think a real problem if we are going to apply ABC to real-world scientific models.

I think there are two kinds of multimodality. One of them is the kind you seen in mixture models and shows up as an artifact of parameterisation. Swap the labels of your mixtures and you get the same underlying object. Your posterior over the space of distributions has only one peak, it is the parameterisation that’s the issue. So in practice ignoring the other peaks (as in variational Bayes for mixtures) works well enough for prediction purposes.

The other kind of multimodality appears when doing statistical inference for scientific (not statistical) models. That’s when your model is expressive enough to include qualitatively different scenarios that could explain the data just as well. For example (I’m making this up), you have a dynamical model for economic growth and the same data could be explained by having a large effect for education and a small effect for health, or a large effect of health and a small one for education. In other words, you are trying to do model comparison through parameter inference. It’s asking too much of the method – the problem is scientific and not statistical.

My guess is that we’ll see a lot of that in applications of ABC, and I don’t think any of the methods will be any good at coping with the issue.

My even more general point relates to your comment that “The current approach to ABC is to consider p(θ|s(y)) as a target per se, not as an approximation to p(θ|y)”.

As far as I understand the philosophy behind considering $p(\theta|s(y))$ as a genuine target in itself is the idea that you only trust your model to tell you about the summary statistics s(y) rather than y itself. I’m not sure what to think about this.

I think it may be useful to contrast ABC to Generalised Method of Moments and Empirical Likelihood approaches.

In this cases you are also assuming that your model only tells you about some aspects of the data – for example, you could have a model that only expresses that higher education has a positive effect on mean GDP. However, crucially, you make no assumptions about p(s(y)|theta). What’s a bit strange about the ABC philosophy is that we are saying we trust the model to say something useful only about summary statistics, but we trust it enough to get the *distribution* of these sumary statistics right. If that’s the case then why couldn’t it get the distribution for the whole data right?

]]>Yes, ABC-EP should not work well if posterior is multi-modal. (I wonder which ABC method would work well in such case.)

The plot you selected kind of answers your last point: the dashed line corresponds to MCMC-ABC, using the “best” sufficient stat that Peters et al. found (other were performing much worse). In our experiments, the bias introduced by summary stats is far larger than the EP bias.

Thanks for discussing our paper. ]]>