Matt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β]. I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…
Archive for QUT
With Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of Statistics & Computing. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).
Across 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform a single SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)
Since my student’s paper on Seaman et al (2012) got promptly rejected by TAS for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.
Last evening, I read a nice paper with the above title by Drovandi, Pettitt and McCutchan, from QUT, Brisbane. Low count refers to observation with a small number of integer values. The idea is to mix ABC with the unbiased estimators of the likelihood proposed by Andrieu and Roberts (2009) and with particle MCMC… And even with a RJMCMC version. The special feature that makes the proposal work is that the low count features allows for a simulation of pseudo-observations (and auxiliary variables) that may sometimes authorise an exact constraint (that the simulated observation equals the true observation). And which otherwise borrows from Jasra et al. (2013) “alive particle” trick that turns a negative binomial draw into an unbiased estimation of the ABC target… The current paper helped me realise how powerful this trick is. (The original paper was arXived at a time I was off, so I completely missed it…) The examples studied in the paper may sound a wee bit formal, but they could lead to a better understanding of the method since alternatives could be available (?). Note that all those examples are not ABC per se in that the tolerance is always equal to zero.
The paper also includes reversible jump implementations. While it is interesting to see that ABC (in the authors’ sense) can be mixed with RJMCMC, it is delicate to get a feeling about the precision of the results, without a benchmark to compare to. I am also wondering about less costly alternatives like empirical likelihood and other ABC alternatives. Since Chris is visiting Warwick at the moment, I am sure we can discuss this issue next week there.
Today, Ewan Cameron arXived a paper that generalises our Robert and Marin (2010) paper on the measure theoretic difficulties (or impossibilities) of the Savage-Dickey ratio and on the possible resolutions. (A paper of mine’s I like very much despite it having neither impact nor quotes, whatsoever! Until this paper.) I met Ewan last year when he was completing a PhD with Tony Pettitt at QUT in astrostatistics, but he
also worked did not work on this transdimensional ABC algorithm with application to worm invasion in Northern Alberta (arXive I reviewed last week)… Ewan also runs a blog called Another astrostatistics blog, full of goodies, incl. the one where he announces he moves to… zoology in Oxford! Anyway, this note extends our paper and a mathematically valid Savage-Dickey ratio representation to the case when the posterior distributions have no density against the Lebesgue measure. For instance for Dirichlet processes or Gaussian processes priors. Using generic Radon-Nykodim derivatives instead. The example is somewhat artificial, superimposing a Dirichlet process prior onto the Old faithful benchmark. But this is an interesting entry, worth mentioning, into the computation of Bayes factors. And the elusive nature of the Savage-Dickey ratio representation.
After a sunny weekend to unpack and unwind, I am now back to my normal schedule, on my way to Paris-Dauphine for an R (second-chance) exam. Except for confusing my turn signal for my wiper, thanks to two weeks of intensive driving in four Australian states!, things are thus back to “normal”, meaning that I have enough of a control of my time to handle both daily chores like the R exam and long-term projects. Including the special issues of Statistical Science, TOMACS, and CHANCE (reviewing all books of George Casella in memoriam). And the organisation of MCMSki 4, definitely taking place in Chamonix on January 6-8, 2014, hopefully under the sponsorship of the newly created BayesComp section of ISBA. And enough broadband to check my usual sites and to blog ad nauseam.
This trip to Australia, along the AMSI Lectures as well as the longer visits to Monash and QUT, has been quite an exciting time, with many people met and ideas discussed. I came back with a (highly positive) impression of Australian universities as very active places, just along my impression of Australia being a very dynamic and thriving country, far far away from the European recession. I was particularly impressed by the number of students within Kerrie Mengersen’s BRAG group, when we did held discussions in classrooms that felt full like a regular undergrad class! Those discussions and meetings set me towards a few new projects along the themes of mixture estimation and model choice, as well as convergence assessment. During this trip, I however also felt the lack of long “free times” I have gotten used to, thanks to the IUF chair support, where I can pursue a given problem for a few hours without interruption. Which means that I did not work as much as I wanted to during this tour and will certainly avoid such multiple-step trips in a near future. Nonetheless, overall, the own under” experience was quite worth it! (Even without considering the two weeks of vacations I squeezed in the middle.)
Back to “normal” also means I already had two long delays caused by suicides on my train line…
Last year at this time, Peter Sarnak toured Australia talking about randomness in number theory and Moebius randomness in dynamics. Recently, he pointed a paper on the arXiv in which he claims that the distribution of [integer coordinate] points on the sphere of radius √n which satisfy
is random as n goes to infinity (the paper is much more precise). You mentioned tests which look for non-randomness. How does one test for a non-random distribution of points on the sphere?
Interesting question, both for linking two AMSI Lecture tours (Peter Sarnak’s schedule sounded more gruelling than mine!) and for letting me get a look at this paper. Plus for the connection with probabilistic number theory. This paper indeed stands within the area of randomness in number theory rather than random generation and I do not see an obvious connection here, but the authors of the paper undertake a study of the randomness of the solutions to the above equation for a fixed n using statistics and their limiting distribution. (I am not certain of the way points are obtained over a square on Fig. 1, presumably this is using the spherical coordinates of the projections over the unit sphere in R3.) Their statistics are the electrostatic energy, Ripley’s point pair statistic, the nearest neighbour spacing measure, minimum spacing, and the covering radius. The most surprising feature of this study is that this randomness seems to be specific to the dimension 3 case: when increasing the number of terms in the above equation, the distribution of the solutions seems more rigid and less random…