ABC and sufficient statistics

Chris Barnes, Sarah Filippi, Michael P.H. Stumpf, and Thomas Thorne posted a paper on arXiv on the selection of sufficient statistics towards ABC model choice. This paper, called Considerate Approaches to Achieving Sufficiency for ABC model selection, was presented by Chris Barnes during ABC in London two months ago. (Note that all talks of the meeting are now available in Nature Precedings. A neat concept by the way!) This paper of them builds on our earlier warning about (unfounded) ABC model selection to propose a selection of summary statistics that partly alleviates the  original problem. (The part about the discrepancy with the true posterior probability remains to be addressed. As does the issue of whether or not the selected collection of statistics provides a convergent model choice inference. We are currently working on it…) Their section “Resuscitating ABC model choice” states quite clearly the goal of the paper:

– this [use of inadequate summary statistics] mirrors problems that can also be observed in the parameter estimation context,
– for many important, and arguably the most important applications of ABC, this problem can in principle be avoided by using the whole data rather than summary statistics,
– in cases where summary statistics are required, we argue that we can construct approximately sufficient statistics in a disciplined manner,
– when all else fails, a change in perspective, allows us to nevertheless make use of the flexibility of the ABC framework

The driving idea in the paper is to use an entropy approximation to measure the lack of information due to the use of a given set of summary statistics. The corresponding algorithm then proceeds from a starting pool of summary statistics to build sequentially a collection of the most informative summary statistics (which, in a sense, reminded me of a variable selection procedure based on Kullback-Leibler, we developed with  Costas Goutis and Jérôme Dupuis). It is a very interesting advance in the issue of ABC model selection, even though it cannot eliminate all stumbling blocks. The interpretation that ABC should be processed as an inferential method on its own rather than an approximation to Bayesian inference is clearly appealing. (Fearnhead and Prangle, and Dean, Singh, Jasra and Peters could be quoted as well.)

A few things I do not agree with [or object to] in the paper are (this is also the core of my referee’s report!)

  1. While the information theoretic motivation is attractive, I do not see [as a Bayesian?] the point of integrating over the data space (Result 1 and 2) since the expectation should be only against the parameter and not against the data. If S=S(X) is sufficient, then almost surely, the posterior given X=x is the same as the posterior given S(x)=s(x). Checking for the expectation in X of the log divergence between both posteriors to be zero is unnecessary. So, in the end, this makes me wonder whether (mutual) information theory is the right approach to the problem… Or rather to motivate the use of the Kullback-Leibler divergence, as I do not at all object to the use of this measure of divergence! Also, what is the exact representation used in the paper for computing the Kullback-Leibler divergence KL and for evaluating the posterior densities from an ABC output in the log divergence? This should really be carefully detailed in the paper. Unless I miss the point… [At a minor level, in its current format, the data processing inequality is not properly defined: it should rather state that, when p(x,y,z)=p(y)p(x|y)p(z|y), the mutual information of (X,Y) is larger than the mutual information of (X,Z). Also, in order to define the mutual information between the parameter and the data, it seems to me that θ needs to be a rv. Otherwise, the mutual information is null…]
  2. Of course, and as clearly stated in the paper, the whole method relies on the assumption that there is a reference collection of summary statistics that is somehow sufficient. Which is rather unlikely in most realistic settings (this is noted in the Discussion as well as in our PNAS paper). So the term sufficient should not be used as in Figure 3 for instance. Overall, the method of statistic selection [approximately] provides the subset of the reference collection with the same information content as the whole collection. So, its main impact is to exclude irrelevant summary statistics from a given collection. Which is already a very interesting outcome. What would be even more interesting in my opinion would be to evaluate the Kullback-Leibler distance to the true posterior. [Again, at a minor level, deterministic summary statistics are repeatedly mentioned, but this sounds like an oxymoron to me: a statistic is a transform of the data X. No randomness is involved. (Unless this means something regarding the noisy ABC version? I do not see how.) The start of Algorithm 1 is somethow confusing: what is the meaning of a sufficient set of statistics? Presumably not a set of sufficient statistics.]
  3. Figure 1 of the paper compares the ABC outcome when using four different statistics, empirical mean, empirical variance, minimum and maximum, for a normal sample with unprecised size and unknown mean. The comment that only the empirical mean recovers the true posterior is both correct and debatable because the minimum and maximum observations also contain information about the unknown mean, albeit at a lower convergence rate. This leads to the issue raised by one referee of our PNAS paper about the [lack of] worth in distinguishing between estimation and testing. At a mathematical level, it is correct that a wrong choice of summary statistic (like the empirical variance above) may provide no information for estimation as well as testing. At a methodological level, we now agree that different statistics should be used for testing and for estimation. [Minor point: I find it surprising that the tolerance is the same for all collections of summary statistics. Using a log transform is certainly not enough to standardise the thing.]
  4. I find quite interesting the conclusion about the population genetic study in the paper that one model requires more statistics than another one. This is when considering estimation separately for each model. From a model choice perpective, this cannot be the case: all models must involve the same collection of summary statistics for the posterior probability to be correctly defined. This issue has been puzzling/plaguing me for years about ABC: a proper ABC approximation is model dependent however one needs the “same” statistics to run the comparison… Also, I fail to understand why “we can no longer use these same statistics [used for model estimation] for model checking”: why cannot we [on principle] use for testing the statistics we already used for parameter estimation? In the limiting case the whole data is used, this shows this is not impossible.

3 Responses to “ABC and sufficient statistics”

  1. […] and positive sequel to our paper (that I may well end up refereeing one of those days, like an earlier paper from some of the […]

  2. […] Monday and Tuesday, I will be visiting Lancaster University, first to take part in Dennis Prangle‘s viva and then in a local ABC meeting. I had not been in Lancaster since giving an seminar […]

  3. […] recorded personal and professional impressions from the conference in honor of Chang-Shou Lin. Also, Xi’an’s Og did some public refereeing, the Secret Blogging Seminar discusses Elsevier’s newest service and Images des […]

Leave a reply to Weekly Picks « Mathblogging.org — the Blog Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.