ABC via regression density estimation
Fan, Nott, and Sisson recently posted on arXiv a paper entitled Approximate Bayesian computation via regression density estimation. The theme of the paper is that one could take advantage of the joint simulation of the pair parameter/sample to derive a non-parametric estimate of the conditional distribution of the summary statistic given the parameter, i.e. the sampling distribution. While most or even all regular ABC algorithms do implicitly or explicitly rely on some level of non-parametric estimation, from Beaumont et al.’s (2002) non-parametric regression to Blum and François‘s (2010), Fearnhead and Prangle (2012), and Biau et al. (2012) direct derivations on non-parametric convergence speeds on the kernel bandwidths, this paper centres on the idea to use those same simulations ABC relies upon to build an estimate of the sampling distribution, to be used afterwards as the likelihood in either Bayesian or frequentist inference. Rather than relying on traditional kernel estimates, the adopted approach merges mixtures of experts, namely normal regression mixtures with logit weights (Jordan and Jacobs, 1994) for the marginals, along with a copula representation of the joint distribution (of the summary statistics).
So this is a new kid on the large block of ABC methods! In terms of computing time, it sounds roughly equivalent to regular ABC algorithms in that it relies on the joint simulation of the pair parameter/sample. Plus a certain number of mixtures/mixtures of experts estimations. I have no intuition on how greedy those estimations are. In their unique illustration, the authors report density estimation in dimension 115, which is clearly impressive. I did not see any indication of respective computing times. In terms of inference and connection with the Bayesian exact posterior, I see a few potential caveats: first, the method provides an approximation of the conditional density of the summary statistics given the parameters, while the Bayesian approach considers the opposite. This could induce inefficiencies when the prior is vague and leads to a very wide spread for the values of the summary statistics. Using a neighbourhood of the observed statistics to restrict the range of the simulated statistics thus seems appropriate. (But boils down to our more standard ABC, isn’t it?!) Second, the use of mixtures of experts assume some linear connection between the parameters and the summary statistics: while this reflects Fearnhead and Prangle’s (2012) strategy, this is not necessarily appropriate in settings where those parameters cannot be expressed directly as expectations of the summary statistics (see, e.g., the case of population genetics). Third, the approximation proposed by the paper is a pluggin estimate, whose variability and imprecision are not accounted for in the inference process. Maybe not a major issue, as other solutions also rely on pluggin estimates. And I note the estimation is done once for all, when compared with, e.g., our empirical likelihood solution that requires a (fast) optimisation for each new value of the parameters. Fourth, once the approximation is constructed, a new MCMC run is required and since the (approximated) target is a black box the calibration of the MCMC sampler may turn to be rather delicate, as in the 115 dimensional example.