## ABC via regression density estimation

**F**an, Nott, and Sisson recently posted on arXiv a paper entitled *Approximate Bayesian computation via regression density estimation*. The theme of the paper is that one could take advantage of the joint simulation of the pair parameter/sample to derive a non-parametric estimate of the conditional distribution of the summary statistic given the parameter, i.e. the sampling distribution. While most or even all regular ABC algorithms do implicitly or explicitly rely on some level of non-parametric estimation, from Beaumont et al.’s (2002) non-parametric regression to Blum and François‘s (2010), Fearnhead and Prangle (2012), and Biau et al. (2012) direct derivations on non-parametric convergence speeds on the kernel bandwidths, this paper centres on the idea to use those same simulations ABC relies upon to build an estimate of the sampling distribution, to be used afterwards as the likelihood in either Bayesian or frequentist inference. Rather than relying on traditional kernel estimates, the adopted approach merges *mixtures of experts*, namely normal regression mixtures with logit weights (Jordan and Jacobs, 1994) for the marginals, along with a copula representation of the joint distribution (of the summary statistics).

**S**o this is a new kid on the large block of ABC methods! In terms of computing time, it sounds roughly equivalent to regular ABC algorithms in that it relies on the joint simulation of the pair parameter/sample. Plus a certain number of mixtures/mixtures of experts estimations. I have no intuition on how greedy those estimations are. In their unique illustration, the authors report density estimation in dimension 115, which is clearly impressive. I did not see any indication of respective computing times. In terms of inference and connection with the Bayesian exact posterior, I see a few potential caveats: first, the method provides an approximation of the conditional density of the summary statistics given the parameters, while the Bayesian approach considers the opposite. This could induce inefficiencies when the prior is vague and leads to a very wide spread for the values of the summary statistics. Using a neighbourhood of the observed statistics to restrict the range of the simulated statistics thus seems appropriate. (But boils down to our more standard ABC, isn’t it?!) Second, the use of mixtures of experts assume some linear connection between the parameters and the summary statistics: while this reflects Fearnhead and Prangle’s (2012) strategy, this is not necessarily appropriate in settings where those parameters cannot be expressed directly as expectations of the summary statistics (see, e.g., the case of population genetics). Third, the approximation proposed by the paper is a pluggin estimate, whose variability and imprecision are not accounted for in the inference process. Maybe not a major issue, as other solutions also rely on pluggin estimates. And I note the estimation is done once for all, when compared with, e.g., our empirical likelihood solution that requires a (fast) optimisation for each new value of the parameters. Fourth, once the approximation is constructed, a new MCMC run is required and since the (approximated) target is a black box the calibration of the MCMC sampler may turn to be rather delicate, as in the 115 dimensional example.

December 27, 2012 at 7:50 pm

I think that the idea of using Mixture Models to learn about the joint distribution of (parameter, data) is a very good one. It looks for me that his was actually introduced in another paper (Bonassi et al., 2011). Xian, have you written any post on that paper?

December 20, 2012 at 3:13 pm

I have updated my answer in CV including this post which seems to be related to the method I described there:

http://stats.stackexchange.com/questions/37729/how-to-do-estimation-when-only-summary-statistics-are-available/37734#37734

December 19, 2012 at 12:35 am

Xian sorry I mean an estimate of the sampling distribution as you correctly point out in your thread.

December 19, 2012 at 12:31 am

I like the idea.. however as a person that has to deal with Frequentist Statisticians in his department I would be careful about calling the construction an “approximation to the sampling distribution”