insufficient statistics for ABC model choice

[Here is a revised version of my comments on the paper by Julien Stoehr, Pierre Pudlo, and Lionel Cucala, now to appear [both paper and comments] in Statistics and Computing special MCMSki 4 issue.]

Approximate Bayesian computation techniques are 2000’s successors of MCMC methods as handling new models where MCMC algorithms are at a loss, in the same way the latter were able in the 1990’s to cover models that regular Monte Carlo approaches could not reach. While they first sounded like “quick-and-dirty” solutions, only to be considered until more elaborate solutions could (not) be found, they have been progressively incorporated within the statistican’s toolbox as a novel form of non-parametric inference handling partly defined models. A statistically relevant feature of those ACB methods is that they require replacing the data with smaller dimension summaries or statistics, because of the complexity of the former. In almost every case when calling ABC is the unique solution, those summaries are not sufficient and the method thus implies a loss of statistical information, at least at a formal level since relying on the raw data is out of question. This forced reduction of statistical information raises many relevant questions, from the choice of summary statistics to the consistency of the ensuing inference.

In this paper of the special MCMSki 4 issue of Statistics and Computing, Stoehr et al. attack the recurrent problem of selecting summary statistics for ABC in a hidden Markov random field, since there is no fixed dimension sufficient statistics in that case. The paper provides a very broad overview of the issues and difficulties related with ABC model choice, which has been the focus of some advanced research only for a few years. Most interestingly, the authors define a novel, local, and somewhat Bayesian misclassification rate, an error that is conditional on the observed value and derived from the ABC reference table. It is the posterior predictive error rate

\mathbb{P}^{\text{ABC}}(\hat{m}(y^{\text{obs}})\ne m|S(y^{\text{obs}}))

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather given the transform of the observation by the summary statistic S. The authors even go further to define the error rate of a classification rule based on a first (collection of) statistic, conditional on a second (collection of) statistic (see Definition 1). A notion rather delicate to validate on a fully Bayesian basis. And they advocate the substitution of the unreliable (estimates of the) posterior probabilities by this local error rate, estimated by traditional non-parametric kernel methods. Methods that are calibrated by cross-validation. Given a reference summary statistic, this perspective leads (at least in theory) to select the optimal summary statistic as the one leading to the minimal local error rate. Besides its application to hidden Markov random fields, which is of interest per se, this paper thus opens a new vista on calibrating ABC methods and evaluating their true performances conditional on the actual data. (The advocated abandonment of the posterior probabilities could almost justify the denomination of a paradigm shift. This is also the approach advocated in our random forest paper.)

2 Responses to “insufficient statistics for ABC model choice”

  1. There is a major difference between the conditional error rates
    introduced in Stoehr et al. (2014) and the posterior (predictive)
    error rates in Pudlo, Marin et al. ABC via random forests (arXiv).
    The only common feature between both local error rates is that they
    depend on the observed data (through some summaries because of ABC).

    The first one is the conditional expected value of the
    misclassification loss knowing that the data (or more precisely some
    summaries of the data) are what we have observed. Hence, when we
    integrate this conditional error over the marginal distribution of the
    summaries of the data, we recover the misclassification error integrated
    over the whole prior space.

    The posterior (predictive) error rate (presented in the paper “ABC via random
    “) relies on an expected value over the predictive distribution
    knowing the observed data. Thus, it includes a second integral over
    the data space which does not appear in the condition error rate of
    Stoehr et al. and its computation requires new simulations drawn from
    the posterior distribution.

    As a consequence, the conditional error rates of Stoehr et al. is on
    the same ground as the posterior probabilities (see, for instance
    Proposition 2 of the paper), a feature not shared by the posterior
    predictive error.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s