## insufficient statistics for ABC model choice

**J**ulien Stoehr, Pierre Pudlo, and Lionel Cucala (I3M, Montpellier) arXived yesterday a paper entitled “Geometric summary statistics for ABC model choice between hidden Gibbs random fields“. Julien had presented this work at the MCMski 4 poster session. The move to a *hidden* Markov random field means that our original approach with Aude Grelaud does not apply: there is no dimension-reduction sufficient statistics in that case… The authors introduce a small collection of (four!) focussed statistics to discriminate between Potts models. They further define a novel misclassification rate, conditional on the observed value and derived from the ABC reference table. It is the predictive error rate

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather the transform of the observation by the summary statistic S. In a simulation experiment, the paper shows that the predictive error rate decreases quite a lot by including 2 or 4 geometric summary statistics on top of the no-longer-sufficient concordance statistics. (I did not find how the distance is constructed and how it adapts to a larger number of summary statistics.)

“[the ABC posterior probability of index m]uses the data twice: a first one to calibrate the set of summary statistics, and a second one to compute the ABC posterior.” (p.8)

**I**t took me a while to understand the above quote. If we consider ABC model choice as we did in our original paper, it only and correctly uses the data once. However, if we select the vector of summary statistics based on an empirical performance indicator resulting from the data then indeed the procedure does use the data twice! Is there a generic way or trick to compensate for that, apart from cross-validation?

February 12, 2014 at 9:13 pm

I’ve got an easy question (sorry – I don’t really know anything about discrete MRFs).

Why is this an important question in application?

Does the underlying model have a scientific meaning, or is it a nuisance parameter?

In the continuous case, the choice of smoothness (which is roughly equivalent to the choice of neighbourhood structure) has implications for the frequentest properties of the posterior (in particular, in a slightly un-realistic asymptotic regime, if you specify your model to be too smooth, you will have zero frequentist coverage). Is this similar here?

February 13, 2014 at 7:20 am

This is the spatial equivalent of a hidden Markov model, so I can think of potential applications, even though they are not developed in this paper…

February 13, 2014 at 4:45 pm

Ok. I was wondering if this was a “build it and they will come” sort of situation. I guess you need a solution before problems arrive :p

February 12, 2014 at 3:30 pm

Thank you for sharing this recap about the paper. We do not mention any specific distance in order to keep most of the framework generic, but it is something obviously missing in the “Experiments results” part. We used a L2 normalized distance between the summary statistics in the different algorithm. It might be useful to add it for the discussion.

I would answer to the last question by saying that we do not want to estimate the distribution of the index model with that procedure, we just want to pick up the right model even if there is a bias on the ABC predictor. Indeed, we might artificially increase the posterior probability of the best index model when selecting the vector of summary statistics from the data; but no ABC procedure comparing data sets through non sufficient statistics can claim approximating the true posterior p(m|x), see Robert et al. (PNAS, 2011). The sole guarantee that remains is that ABC procedures will pick the correct model if we provide enough data (Marin et al., JRRS-B, 2014). Hence we believe that our predictor, which “uses the data twice”, do not drop any important, numerical results from the ABC machinery. And thus, we did not seek a way to compensate the bias introduced by the selection of statistics from the data.

February 12, 2014 at 3:34 pm

Merci. Although it sounds too much of a “learner’s reply” to me (and not enough Bayesian to be satisfactory to me), I see your point.