insufficient statistics for ABC model choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on October 17, 2014 by xi'an

[Here is a revised version of my comments on the paper by Julien Stoehr, Pierre Pudlo, and Lionel Cucala, now to appear [both paper and comments] in Statistics and Computing special MCMSki 4 issue.]

Approximate Bayesian computation techniques are 2000’s successors of MCMC methods as handling new models where MCMC algorithms are at a loss, in the same way the latter were able in the 1990’s to cover models that regular Monte Carlo approaches could not reach. While they first sounded like “quick-and-dirty” solutions, only to be considered until more elaborate solutions could (not) be found, they have been progressively incorporated within the statistican’s toolbox as a novel form of non-parametric inference handling partly defined models. A statistically relevant feature of those ACB methods is that they require replacing the data with smaller dimension summaries or statistics, because of the complexity of the former. In almost every case when calling ABC is the unique solution, those summaries are not sufficient and the method thus implies a loss of statistical information, at least at a formal level since relying on the raw data is out of question. This forced reduction of statistical information raises many relevant questions, from the choice of summary statistics to the consistency of the ensuing inference.

In this paper of the special MCMSki 4 issue of Statistics and Computing, Stoehr et al. attack the recurrent problem of selecting summary statistics for ABC in a hidden Markov random field, since there is no fixed dimension sufficient statistics in that case. The paper provides a very broad overview of the issues and difficulties related with ABC model choice, which has been the focus of some advanced research only for a few years. Most interestingly, the authors define a novel, local, and somewhat Bayesian misclassification rate, an error that is conditional on the observed value and derived from the ABC reference table. It is the posterior predictive error rate

\mathbb{P}^{\text{ABC}}(\hat{m}(Y)\ne m|S(y^{\text{obs}}))

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather given the transform of the observation by the summary statistic S. The authors even go further to define the error rate of a classification rule based on a first (collection of) statistic, conditional on a second (collection of) statistic (see Definition 1). A notion rather delicate to validate on a fully Bayesian basis. And they advocate the substitution of the unreliable (estimates of the) posterior probabilities by this local error rate, estimated by traditional non-parametric kernel methods. Methods that are calibrated by cross-validation. Given a reference summary statistic, this perspective leads (at least in theory) to select the optimal summary statistic as the one leading to the minimal local error rate. Besides its application to hidden Markov random fields, which is of interest per se, this paper thus opens a new vista on calibrating ABC methods and evaluating their true performances conditional on the actual data. (The advocated abandonment of the posterior probabilities could almost justify the denomination of a paradigm shift. This is also the approach advocated in our random forest paper.)

a bootstrap likelihood approach to Bayesian computation

Posted in Books, R, Statistics, University life with tags , , , , , , , , on October 16, 2014 by xi'an

This paper by Weixuan Zhu, Juan Miguel Marín [from Carlos III in Madrid, not to be confused with Jean-Michel Marin, from Montpellier!], and Fabrizio Leisen proposes an alternative to our 2013 PNAS paper with Kerrie Mengersen and Pierre Pudlo on empirical likelihood ABC, or BCel. The alternative is based on Davison, Hinkley and Worton’s (1992) bootstrap likelihood, which relies on a double-bootstrap to produce a non-parametric estimate of the distribution of a given estimator of the parameter θ. Including a smooth curve-fitting algorithm step, for which not much description is available from the paper.

“…in contrast with the empirical likelihood method, the bootstrap likelihood doesn’t require any set of subjective constrains taking advantage from the bootstrap methodology. This makes the algorithm an automatic and reliable procedure where only a few parameters need to be specified.”

The spirit is indeed quite similar to ours in that a non-parametric substitute plays the role of the actual likelihood, with no correction for the substitution. Both approaches are convergent, with similar or identical convergence speeds. While the empirical likelihood relies on a choice of parameter identifying constraints, the bootstrap version starts directly from the [subjectively] chosen estimator of θ. For it indeed needs to be chosen. And computed.

“Another benefit of using the bootstrap likelihood (…) is that the construction of bootstrap likelihood could be done once and not at every iteration as the empirical likelihood. This leads to significant improvement in the computing time when different priors are compared.”

This is an improvement that could apply to the empirical likelihood approach, as well, once a large enough collection of likelihood values has been gathered. But only in small enough dimensions where smooth curve-fitting algorithms can operate. The same criticism applying to the derivation of a non-parametric density estimate for the distribution of the estimator of θ. Critically, the paper only processes examples with a few parameters.

In the comparisons between BCel and BCbl that are produced in the paper, the gain is indeed towards BCbl. Since this paper is mostly based on examples and illustrations, not unlike ours, I would like to see more details on the calibration of the non-parametric methods and of regular ABC, as well as on the computing time. And the variability of both methods on more than a single Monte Carlo experiment.

I am however uncertain as to how the authors process the population genetic example. They refer to the composite likelihood used in our paper to set the moment equations. Since this is not the true likelihood, how do the authors select their parameter estimates in the double-bootstrap experiment? The inclusion of Crakel’s and Flegal’s (2013) bivariate Beta, is somewhat superfluous as this example sounds to me like an artificial setting.

In the case of the Ising model, maybe the pre-processing step in our paper with Matt Moores could be compared with the other algorithms. In terms of BCbl, how does the bootstrap operate on an Ising model, i.e. (a) how does one subsample pixels and (b)what are the validity guarantees?

A test that would be of interest is to start from a standard ABC solution and use this solution as the reference estimator of θ, then proceeding to apply BCbl for that estimator. Given that the reference table would have to be produced only once, this would not necessarily increase the computational cost by a large amount…

6th French Econometrics Conference in Dauphine

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on October 15, 2014 by xi'an

La Défense, from Paris-Dauphine, May 2009

On December 4-5, Université Paris-Dauphine will host the 6th French Econometric Conference, which celebrates Christian Gouriéroux and his contributions to econometrics. (Christian was my statistics professor during my graduate years at ENSAE and then Head of CREST when I joined this research unit, first as a PhD student and later as Head of the statistics group. And he has always been a tremendous support for me.)

Not only is the program quite impressive, with co-authors of Christian Gouriéroux and a few Nobel laureates (if not the latest, Jean Tirole, who taught economics at ENSAE when I was a student there), but registration is free. I will most definitely attend the talks, as I am in Paris-Dauphine at this time of year (the week before NIPS). In particular, looking forward to Gallant’s views on Bayesian statistics.

my ISBA tee-shirt designs

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on October 15, 2014 by xi'an

Here are my tee-shirt design proposals for the official ISBA tee-shirt competition! (I used the facilities of as I could not easily find a free software around. Except for the last one where I recycled my vistaprint mug design…)


While I do not have any expectation of seeing one of these the winner (!), what is your favourite one?!

Le Monde puzzle [#882]

Posted in Books, Kids, Statistics, University life with tags , , on October 14, 2014 by xi'an

A terrific Le Monde mathematical puzzle:

All integers between 1 and n² are written in an (n,n)  matrix under the constraint that two consecutive integers are adjacent (i.e. 15 and 13 are two of the four neighbours of 14). What is the maximal value for the sum of the diagonal of this matrix?

Indeed, when considering a simulation resolution (for small values of m), it constitutes an example of self-avoiding random walk: when inserting the integers one by one at random, one produces a random walk over the (n,n) grid.

While the solution is trying to stick as much as possible to the diagonal vicinity for the descending sequence n²,n²-1, &tc., leaving space away from the diagonal for the terminal values, as in this example for n=5,

25 22 21 14 13
24 23 20 15 12
01 02 19 16 11
04 03 18 17 10
05 06 07 08 09

simulating such a random walk is a bit challenging as the brute force solution does not work extremely well: Continue reading

how far can we go with Minard’s map?!

Posted in Books, Linux, pictures, Statistics, Travel with tags , , , , , , , , , , on October 13, 2014 by xi'an

Like many others, I discovered Minard’s map of the catastrophic 1812 Russian campaign of Napoleon in Tufte’s book. And I consider it a masterpiece for its elegant way of summarising some many levels of information about this doomed invasion of Russia. So when I spotted Menno-Jan Kraak’s Mapping Time, analysing the challenges of multidimensional cartography through this map and this Naepoleonic campaign, I decided to get a look at it.

Apart from the trivia about Kraak‘s familial connection with the Russian campaign and the Berezina crossing which killed one of his direct ancestors, his great-great-grandfather, along with a few dozen thousand others (even though this was not the most lethal part of the campaign), he brings different perspectives on the meaning of a map and the quantity of information one could or should display. This is not unlike other attempts at competiting with Minard, including those listed on Michael Friendly’s page. Incl. the cleaner printing above. And the dumb pie-chart… A lot more can be done in 2013 than in 1869, indeed, including the use of animated videos, but I remain somewhat sceptical as to the whole purpose of the book. It is a beautiful object, with wide margins and nice colour reproductions, for sure, alas… I just do not see the added value in Kraak‘s work. I would even go as far as thinking this is an a-statistical approach, namely that by trying to produce as much data as possible into the picture, he forgets the whole point of the drawing which is I think to show the awful death rate of the Grande Armée along this absurd trip to and from Moscow and the impact of temperature (although the rise that led to the thaw of the Berezina and the ensuing disaster does not seem correlated with the big gap at the crossing of the river). If more covariates were available, two further dimensions could be added: the proportions of deaths due to battle, guerilla, exhaustion, desertion, and the counterpart map of the Russian losses. In the end, when reading Mapping Time, I learned more about the history surrounding this ill-planned military campaign than about the proper display of data towards informative and unbiased graphs.

poor graph of the day

Posted in Books with tags , , , on October 12, 2014 by xi'an



Get every new post delivered to your Inbox.

Join 667 other followers