## extending ABC to high dimensions via Gaussian copula

**L**i, Nott, Fan, and Sisson arXived last week a new paper on ABC methodology that I read on my way to Warwick this morning. The central idea in the paper is (i) to estimate marginal posterior densities for the components of the model parameter by non-parametric means; and (ii) to consider all pairs of components to deduce the correlation matrix R of the Gaussian (pdf) transform of the pairwise rank statistic. From those two low-dimensional estimates, the authors derive a joint Gaussian-copula distribution by using inverse pdf transforms and the correlation matrix R, to end up with a meta-Gaussian representation

where the η’s are the Gaussian transforms of the inverse-cdf transforms of the θ’s,that is,

Or rather

given that the g’s are estimated.

This is obviously an approximation of the joint in that, even in the most favourable case when the g’s are perfectly estimated, and thus the components perfectly Gaussian, the joint is not necessarily Gaussian… But it sounds quite interesting, provided the cost of running all those transforms is not overwhelming. For instance, if the g’s are kernel density estimators, they involve sums of possibly a large number of terms.

One thing that bothers me in the approach, albeit mostly at a conceptual level for I realise the practical appeal is the use of *different* summary statistics for approximating different uni- and bi-dimensional marginals. This makes for an incoherent joint distribution, again at a conceptual level as I do not see immediate practical consequences… Those local summaries also have to be identified, component by component, which adds another level of computational cost to the approach, even when using a semi-automatic approach as in Fernhead and Prangle (2012). Although the whole algorithm relies on a single reference table.

The examples in the paper are (i) the banana shaped “Gaussian” distribution of Haario et al. (1999) that we used in our PMC papers, with a twist; and (ii) a g-and-k quantile distribution. The twist in the banana (!) is that the banana distribution is the prior associated with the mean of a Gaussian observation. In that case, the meta-Gaussian representation seems to hold almost perfectly, even in p=50 dimensions. (If I remember correctly, the hard part in analysing the banana distribution was reaching the tails, which are extremely elongated in at least one direction.) For the g-and-k quantile distribution, the same holds, even for a regular ABC. What seems to be of further interest would be to exhibit examples where the meta-Gaussian is clearly an approximation. If such cases exist.

April 28, 2015 at 8:35 am

Thanks Christian,

The copula ABC approximation for the banana distribution holds actually for p=250 dimensions in the paper (not p=50), but of course, by construction it will hold for p=\infty, computing time and storage requirements notwithstanding!

The “incoherent” model (with different summary statistics for different univariate and bivariate margins) is a reasonable point, although it is simple enough to check that using a low-dimensional subset/function of the full vector of summary statistics gives a more precise marginal posterior estimate compared to the full vector in each case. (And if it doesn’t then don’t use it!) So in this sense, even any “incoherence” is actually an improvement when trading off against the otherwise poor vanilla ABC approximation.

In addition, there is some work in the density estimation literature (discussed in our previous paper on high-dimensional ABC in JCGS, 2014) that involves using different datasets to estimate different marginals for various parts of the posterior. So whatever inherent problems these techniques have, we inherit in this approach. Although to date, we have only experienced improvements over standard ABC using these approaches.

Scott