Archive for survey

Questions on ABC

Posted in Statistics, University life with tags , , , , , , on May 31, 2011 by xi'an

Our ABC survey for Statistics and Computing (and the ABC special issue!) has been quickly revised, resubmitted, and rearXived. Here is our conclusion about some issues that remain unsolved (much more limited in scope than the program drafted by Halton!):

  1. the convergence results obtained so far are unpractical in that they require either the tolerance to go to zero or the sample size to go to infinity. Obtaining exact error bounds for positive tolerances and finite sample sizes would bring a strong improvement in both the implementation of the method and in the assessment of its worth.
  2. in particular, the choice of the tolerance is so far handled from a very empirical perspective. Recent theoretical assessments show that a balance between Monte Carlo variability and target approximation is necessary, but the right amount of balance must be reached towards a practical implementation.
  3.  even though ABC is often presented as a converging method that approximates Bayesian inference, it can also be perceived as an inference technique per se and hence analysed in its own right. Connections with indirect inference have already been drawn, however the fine asymptotics of ABC would be most useful to derive. Moreover, it could indirectly provide indications about the optimal calibration of the algorithm.
  4. in connection with the above, the connection of ABC-based inference with other approximative methods like variational Bayes inference is so far unexplored. Comparing and interbreeding those different methods should become a research focus as well.
  5. the construction and selection of the summary statistics is so far highly empirical. An automated approach based on the principles of data analysis and approximate sufficiency would be much more attractive and convincing, especially in non-standard and complex settings. \item the debate about ABC-based model choice is so far inconclusive in that we cannot guarantee the validity of the approximation, while considering that a “large enough” collection of summary statistics provides an acceptable level of approximation. Evaluating the discrepancy by exploratory methods like the bootstrap would shed a much more satisfactory light on this issue.
  6.  the method necessarily faces limitations imposed by large datasets or complex models, in that simulating pseudo-data may itself become an impossible task. Dimension-reducing techniques that would simulate directly the summary statistics will soon become necessary.

a survey on ABC

Posted in R, Statistics with tags , , , on January 7, 2011 by xi'an

With Jean-Michel Marin, Pierre Pudlo and Robin Ryder, we just completed a survey on the ABC methodology. It is now both arXived and submitted to Statistics and Computing. Rather interestingly, our first draft was written in Jean-Michel’s office in Montpelier by collating the ‘Og posts surveying new ABC papers! (Interestingly because this means that my investment in the ‘Og is now such that it needs to [and can] be recycled into papers and books. Another paper with Randal Douc is inspired from a reply to a comment…) Besides surveying the recent literature, this paper illustrates the behaviour of the ABC approximation in the simple case of the MA(2) model. Both graphs reproduced here illustrate the impact of the choice of the distance (above) and of the tolerance level (below, in a model choice setting).

Insect collection [live]

Posted in Kids, pictures with tags , , , on May 24, 2010 by xi'an

The Museum d’Histoire Naturelle has launched a large survey of pollen gathering insects all over France and asks volunteers to contribute their observations according to a rather simple protocol. (Well, not that simple: I tried to upload my series of pictures of the lilac tree at the back of my garden and failed…) I do not know how representative the data thus gathered will be nor how the biologists in charge of the study are going to use it but this is a neat idea nonetheless…

An incomplete history of Markov Chain Monte Carlo

Posted in Statistics, University life with tags , , , on April 28, 2009 by xi'an

Last August, George Casella and I posted on arXiv a paper entitled “A History of Markov Chain Monte Carlo — Subjective Recollections from Incomplete Data —” that contained some recollections (of ours and of others) on the emergence of MCMC methods. The paper still is under review but Brad Carlin just sent me the following comments:

I learned a ton by reading your paper, and I thought I knew a little MC history! I have a few comments which I obnoxiously organize by page number (at least in the version I got):

p.2: I actually was in a hotel room at Valencia 4 (1991) and saw Andrew Thomas sitting cross-legged on a hotel bed with a primitive version of BUGS running on a primitive “portable” computer. I think they were doing the rat data problem from G&S (1990). Even then it was easy to see they were on to something.

p.5: I love that Metropolis et al had a data analysis where they used 16 burn-in and 64 “production” makes me feel better about my past applied work.

p.15: Thanks for the nice ref to Carlin and Chib — and you are going to send me scrambling to Brooks et al to re-read that “completion scheme”. But then at the end of this para I was a little disappointed you didn’t mention Spiegelhalter et al (2002), since the ready availability of DIC within BUGS is another reason why nobody ever uses RJMCMC for model choice, as was originally envisioned. I actually had a PhD student go insane trying to successfully program an RJMCMC algorithm, and since then I’ve tried to avoid asking anyone else to do it.

p.17: you sure you want to say “plod on”? Sounds like you’re tired, and I know this is not true!

p.18, etc: Thanks so much for this reminder of the 1991 OSU conference. I remember giving that talk in the second session on the first day with Mockus (who like me was visiting CMU at that time) and Cliff Litton. The LCG paper wound up being my interview talk for the Minnesota job, and later a discussion paper in JASA. but then so many of these talks became important papers: Tierney (94), Gelman and Rubin (92) and its ‘evil twin’ Geyer (92), Gilks (92), Albert and Chib (93), and on and on. It really was a who’s who and a heady time for a kid like me, less than 2 years out of grad school. Fortunately I had a big ego.

If you can stand one more anecdote, the most memorable aspect of the conference was this apparently older, bald guy standing in the back of the room shouting that the guys on stage didn’t know what they were talking about. It didn’t matter who the guys were (Gelman, Smith, Gelfand, etc), he was not shy about expressing his opinions. Many involved the notion that many parallel chains would be worse than one long chain. Of course this guy turned out to be Charlie Geyer, and the whole episode (esp with Gelman) presaged the train wreck their papers hit at Stat Sci a few months later.

We’ll make sure to incorporate those helpful recollections into the revision when (if) the paper comes back! (Maybe leaving out the health warning that RJMCMC may cause insanity…)

Follow

Get every new post delivered to your Inbox.

Join 342 other followers