Archive for Statistics and Computing

Approximate Bayesian computational methods on-line

Posted in R, Statistics, University life with tags , , , , , , on October 25, 2011 by xi'an

Fig. 4 – Boxplots of the evolution [against ε] of ABC approximations to the Bayes factor. The representation is made in terms of frequencies of visits to [accepted proposals from] models MA(1) and MA(2) during an ABC simulation when ε corresponds to the 10,1,.1,.01% quantiles on the simulated autocovariance distances. The data is a time series of 50 points simulated from a MA(2) model. The true Bayes factor is then equal to 17.71, corresponding to posterior probabilities of 0.95 and 0.05 for the MA(2) and MA(1) models, resp.

The survey we wrote with Jean-Michel Marin, Pierre Pudlo, and Robin Ryder is now published in [the expensive] Statistics and Computing (on-line). Beside recycling a lot of Og posts on ABC, this paper has the (personal) appeal of giving us the first hint that all was not so rosy in terms of ABC model choice. I wonder whether or not it will be part of the ABC special issue.

Questions on ABC

Posted in Statistics, University life with tags , , , , , , on May 31, 2011 by xi'an

Our ABC survey for Statistics and Computing (and the ABC special issue!) has been quickly revised, resubmitted, and rearXived. Here is our conclusion about some issues that remain unsolved (much more limited in scope than the program drafted by Halton!):

  1. the convergence results obtained so far are unpractical in that they require either the tolerance to go to zero or the sample size to go to infinity. Obtaining exact error bounds for positive tolerances and finite sample sizes would bring a strong improvement in both the implementation of the method and in the assessment of its worth.
  2. in particular, the choice of the tolerance is so far handled from a very empirical perspective. Recent theoretical assessments show that a balance between Monte Carlo variability and target approximation is necessary, but the right amount of balance must be reached towards a practical implementation.
  3.  even though ABC is often presented as a converging method that approximates Bayesian inference, it can also be perceived as an inference technique per se and hence analysed in its own right. Connections with indirect inference have already been drawn, however the fine asymptotics of ABC would be most useful to derive. Moreover, it could indirectly provide indications about the optimal calibration of the algorithm.
  4. in connection with the above, the connection of ABC-based inference with other approximative methods like variational Bayes inference is so far unexplored. Comparing and interbreeding those different methods should become a research focus as well.
  5. the construction and selection of the summary statistics is so far highly empirical. An automated approach based on the principles of data analysis and approximate sufficiency would be much more attractive and convincing, especially in non-standard and complex settings. \item the debate about ABC-based model choice is so far inconclusive in that we cannot guarantee the validity of the approximation, while considering that a “large enough” collection of summary statistics provides an acceptable level of approximation. Evaluating the discrepancy by exploratory methods like the bootstrap would shed a much more satisfactory light on this issue.
  6.  the method necessarily faces limitations imposed by large datasets or complex models, in that simulating pseudo-data may itself become an impossible task. Dimension-reducing techniques that would simulate directly the summary statistics will soon become necessary.

ABC in London [quick recap']

Posted in Statistics, Travel, University life with tags , , , , on May 6, 2011 by xi'an

The meeting yesterday went on very smoothly and nicely. Despite a tight schedule of 12 talks that made the meeting a very full day (and a very early start from Paris),  it did not feel that exhausting, as also shown by the ensuing discussion in the Queens Arm after the talks. (The organisation of the meeting by Michael Stumpf and his group at Imperial was splendid, with plenty of tea and food to sustain the audience, and a very nice conference room.) It obviously helped that I had read a large portion of the papers related to the talks.

The meeting started with David Balding recalling a few quotes from Alan Templeton to stress that ABC was not uniformly well-received in all circles, then Adam Powell gave a fascinating talk about an implementation of ABC on tracking the evolution of dairy farming in Europe. One amazing result in this work was that the whole of European cattle originated from a small herd of a few hundred domesticated aurochs in the Fertile Crescent! Simon Tavaré presented an equally fascinating study on the ancestral tree of primates that used a mix of ABC and MCM, recently published in System Biology, with the age of the common ancestor estimated to be between 80 and 90 million years ago (and an additional estimation of the divergence between humans and chimpanzees to be closer to 8 million years than 5 million years as thought previously). Tina Toni talked about the application of ABC-SMC and ABC model choice to complex biochemical dynamics. Pierre Pudlo and Mohammed Sedki introduced the new ABC-SMC scheme for selecting the tolerance we are developing (with Jean-Michel Marin and Jean-Marie Cornuet), which builds on Del Moral, Doucet and Jasra’s ABC-SMC (and hopefully completed soon to be submitted to Statistics and Computing special ABC issue). Oliver Ratmann showed an implementation of his model assessment to several epidemic data, including a superb influenza sequence. Ajay Jasra explained the main ideas in the ABC HMM paper I recently discussed (even mentioning the post during the talk!). Mark Beaumont started with a recollection of the developments on his GIMH algorithm and illustrated the use of particle MCMC with an ABC target in a dynamic admixture model with a sort of Dirichlet random walk on the admixture parameters. Michael Blum presented his study on the clear estimation error improvement brought by linear and non-linear adjustments to the raw ABC output. Dennis Prangle then followed by a pedagogical introduction to the semi-automated ABC discussed several times on the ‘Og. In the final session on ABC model choice, Xavier Didelot started the discussion by stating the problem about Bayes factor approximation and the resolution in the case of exponential families and Chris Barnes showed us a new method for picking summary statistics by a Kullback-Leibler criterion (Michael Stumpf had sent me the draft of the paper a few days ago and I will comment on the approach once it is available on arXiv).

Again, a very full but exhilarating day! Looking forward the next edition in Roma!

Statistics and Computing and ABC

Posted in R, Statistics with tags , , on February 23, 2011 by xi'an

Statistics and Computing has received several papers on ABC and plans to make a special ABC issue out of these. All submissions related to ABC that are made prior to late June 2011 and that are accepted will be published in this special issue. The special issue is identified as a specific article type on the on-line submissions page.

In case you have questions or requests about this special issue, please directly contact the Editor Gilles Celeux or the publishing editor. Not me! I am simply forwarding the announcement from the Editor to all those interested.

a survey on ABC

Posted in R, Statistics with tags , , , on January 7, 2011 by xi'an

With Jean-Michel Marin, Pierre Pudlo and Robin Ryder, we just completed a survey on the ABC methodology. It is now both arXived and submitted to Statistics and Computing. Rather interestingly, our first draft was written in Jean-Michel’s office in Montpelier by collating the ‘Og posts surveying new ABC papers! (Interestingly because this means that my investment in the ‘Og is now such that it needs to [and can] be recycled into papers and books. Another paper with Randal Douc is inspired from a reply to a comment…) Besides surveying the recent literature, this paper illustrates the behaviour of the ABC approximation in the simple case of the MA(2) model. Both graphs reproduced here illustrate the impact of the choice of the distance (above) and of the tolerance level (below, in a model choice setting).

Follow

Get every new post delivered to your Inbox.

Join 357 other followers