## Bayesian optimization for likelihood-free inference of simulator-based statistical models

**M**ichael Gutmann and Jukka Corander arXived this paper two weeks ago. I read part of it (mostly the extended introduction part) on the flight from Edinburgh to Birmingham this morning. I find the reflection it contains on the nature of the ABC approximation quite deep and thought-provoking. Indeed, the major theme of the paper is to visualise ABC (which is admittedly shorter than “likelihood-free inference of simulator-based statistical models”!) as a regular computational method based on an approximation of the likelihood function at the observed value, y_{obs}. This includes for example Simon Wood’s synthetic likelihood (who incidentally gave a talk on his method while I was in Oxford). As well as non-parametric versions. In both cases, the approximations are based on repeated simulations of pseudo-datasets for a given value of the parameter θ, either to produce an estimation of the mean and covariance of the sampling model as a function of θ or to construct genuine estimates of the likelihood function. As assumed by the authors, this calls for a small dimension θ. This approach actually allows for the inclusion of the synthetic approach as a lower bound on a non-parametric version.

In the case of Wood’s synthetic likelihood, two questions came to me:

- the estimation of the mean and covariance functions is usually not smooth because new simulations are required for each new value of θ. I wonder how frequent is the case where we can always use the same basic random variates for all values of θ. Because it would then give a smooth version of the above. In the other cases, provided the dimension is manageable, a Gaussian process could be first fitted before using the approximation. Or any other form of regularization.
- no mention is made [in the current paper] of the impact of the parametrization of the summary statistics. Once again, a Cox transform could be applied to each component of the summary for a better proximity of/to the normal distribution.

When reading about a non-parametric approximation to the likelihood (based on the summaries), the questions I scribbled on the paper were:

- estimating a complete density when using this estimate at the single point y
_{obs}could possibly be superseded by a more efficient approach. - the authors study a kernel that is a function of the difference or distance between the summaries and which is maximal at zero. This is indeed rather frequent in the ABC literature, but does it impact the convergence properties of the kernel estimator?
- the estimation of the tolerance, which happens to be a bandwidth in that case, does not appear to be processed in this paper, which could explain for very low probabilities of acceptance mentioned in the paper.
- I am lost as to why lower bounds on likelihoods are relevant here. Unless this is intended for ABC maximum likelihood estimation.

Guttmann and Corander also comment on the first point, through the cost of producing a likelihood estimator. They therefore suggest to resort to regression and to avoid regions of low estimated likelihood. And rely on Bayesian optimisation. (Hopefully to be commented later.)

February 4, 2015 at 4:44 pm

* I find the reflection it contains on the nature of the ABC approximation quite deep and thought-provoking.

We are glad that our summary of LFI/ABC was stimulating and not an old story. While apparently fresh it is not the complete story though: the actual contributions are deeper down the paper, in the later sections.

* ABC […] as a regular computational method based on an approximation of the likelihood function at the observed value, yobs.

It seems that the sentence is a mixture of two sentences:

1) ABC as a regular computational method based on an

approximation of the likelihood function.

2) ABC as a regular computational method based on an

approximation of the pdf at the observed value, yobs.

We agree that both sentences describe what we were doing in the paper, but their mixture is not accurate any more.

* Approximate Bayesian Computation (ABC) vs Likelihood-Free Inference (LFI)

Regarding “[…] ABC (which is admittedly shorter than “likelihood-free inference of simulator-based statistical models”!) […] ” :

ABC and LFI seem to be often used like synonyms. But don’t they actually emphasize different things? ABC puts emphasis on Bayesian inference while LFI is silent about the general inference framework employed. LFI, on the other hand, puts emphasis on the fact that the likelihood cannot be used (since intractable) while ABC is silent about the reason for the approximation.

LFI may also have a more general connotation than ABC: Inference methods for models with intractable partition functions (unnormalized models), e.g. those discussed on the blog here https://xianblog.wordpress.com/2014/05/28/estimating-normalising-constants-mea-culpa/

may also be considered LFI methods. This is a reason why we added the “simulator-based models” to the title.

* no mention is made [in the current paper] of the impact of the parametrization of the summary statistics.

We consider the summary statistics as given because in this paper, we are not concerned with the choice of the summary statistics but with the (computational) difficulties in ABC/LFI. However, this does of course not mean that the choice of the summary statistics is not important. In fact, we have worked on that topic as well: http://arxiv.org/abs/1407.4981

* the estimation of the tolerance, which happens to be a bandwidth in that case, does not appear to be processed in this paper, which could explain for very low probabilities of acceptance mentioned in the paper.

While not the focus of the paper, the choice of the bandwidth is discussed in Section 5.3.

The low acceptance probabilities are mentioned in Example 6: The example illustrates that smaller bandwidths yield better approximations of the likelihood. It seems though that in the standard approach a more accurate approximation of the likelihood is tied to a decrease of the acceptance probability (see also Fig 4), and that different strategies to choose the bandwidth cannot change that.

* I am lost as to why lower bounds on likelihoods are relevant here. Unless this is intended for ABC maximum likelihood estimation.

The lower bounds are relevant because

– they allow to link the synthetic approach to the nonparametric

approach,

– they can indeed be used for likelihood-free point estimation without having

to choose bandwidths (see e.g. Example 7 or Figure 12),

– they shift the focus to the discrepancies and the regression-based

approximations of the likelihood, which are important for the later parts of the paper.

Happy continued reading,

Michael and Jukka

February 3, 2015 at 6:55 pm

“ABC (which is admittedly shorter than “likelihood-free inference of simulator-based statistical models”!) as a regular computational method based on an approximation of the likelihood function at the observed value”

I really agree with this view, and I think the concentration of the community on the “B” within ABC (sampling algorithms …) has unfortunately prevented a more holistic view of ABC as just one example of what I would prefer to call simulation-based inference instead of “likelihood-free inference” (there is a likelihood after all, just not tractable).

We tried to visualize this in http://onlinelibrary.wiley.com/doi/10.1111/j.1461-0248.2011.01640.x/full

Not sure if this figure will show up: