ABC à Montréal
So today was the NIPS 2014 workshop, “ABC in Montréal“, which started with a fantastic talk by Juliane Liepe on some exciting applications of ABC to the migration of immune cells, with the analysis of movies involving those cells acting to heal a damaged fly wing and a cut fish tail. Quite amazing videos, really. (With the great entry line of ‘We have all cut a finger at some point in our lives’!) The statistical model behind those movies was a random walk on a grid, with different drift and bias features that served as model characteristics. Frank Wood managed to deliver his talk despite a severe case of food poisoning, with a great illustration of probabilistic programming that made me understand (at last!) the very idea of probabilistic programming. And Vikash Mansinghka presented some applications in image analysis. Those two talks led me to realise why probabilistic programming was so close to ABC, with a programming touch! Hence why I was invited to talk today! Then Dennis Prangle exposed his latest version of lazy ABC, that I have already commented on the ‘Og, somewhat connected with our delayed acceptance algorithm, to the point that maybe something common can stem out of the two notions. Michael Blum ended the day with provocative answers to the provocative question of Ted Meeds as to whether or not machine learning needed ABC (Ans. No!) and whether or not machine learning could help ABC (Ans. ???). With an happily mix-up between mechanistic and phenomenological models that helped generating discussion from the floor.
The posters were also of much interest, with calibration as a distance measure by Michael Guttman, in continuation of the poster he gave at MCMski, Aaron Smith presenting his work with Luke Bornn, Natesh Pillai and Dawn Woodard, on why a single pseudo-sample is enough for ABC efficiency. This gave me the opportunity to discuss with him the apparent contradiction with the result of Kryz Łatunsziński and Anthony Lee about the geometric convergence of ABC-MCMC only attained with a random number of pseudo-samples… And to wonder if there is a geometric versus binomial dilemma in this setting, Namely, whether or not simulating pseudo-samples until one is accepted would be more efficient than just running one and discarding it in case it is too far. So, although the audience was not that large (when compared with the other “ABC in…” and when considering the 2500+ attendees at NIPS over the week!), it was a great day where I learned a lot, did not have a doze during talks (!), [and even had an epiphany of sorts at the treadmill when I realised I just had to take longer steps to reach 16km/h without hyperventilating!] So thanks to my fellow organisers, Neil D Lawrence, Ted Meeds, Max Welling, and Richard Wilkinson for setting the program of that day! And, by the way, where’s the next “ABC in…”?! (Finland, maybe?)
December 16, 2014 at 3:21 pm
Thanks for co-organising an excellent workshop. For me it was especially interesting to see some machine learning applications of ABC (like computer vision) and their ideas on implementation (like probabilistic programming and getting summary statistics from deep learning).
I agree it would be useful to think about the connections between delayed acceptance and lazy ABC methods. It would also be nice to find a way of applying one or both ideas to the one-hit algorithm.
December 14, 2014 at 2:33 pm
“whether or not machine learning could help ABC (Ans. ???)” My answer was yes for sure, I should have been more explicit about that. Many thanks for organizing the workshop and giving me an occasion to attend to NIPS.
December 14, 2014 at 4:01 pm
Thank you, Michael!!! Incidentally, here is the example about using mad and med as single statistics that I mentioned at the workshop. (Using the log on the mad or on the empirical variance did not help.)
December 15, 2014 at 6:37 pm
Can you give me fixed values for median and mad so that we can compare our analysis.
Mike
December 14, 2014 at 12:27 pm
Hi Xian, do you think the slides will become available on-line?
December 14, 2014 at 4:02 pm
We have actually not discussed this option, thanks for suggesting it, Chris!
December 13, 2014 at 9:22 pm
This was the first “ABC in…” that I have missed! I wish that I could have been there and also to see this poster and speak with Aaron, who I have never met. Having read the paper by Luke, Natesh, Aaron and Dawn (BPSW) there is actually no contradiction regarding the results in their paper and the results in the paper with Krys (LL).
Proposition 4 of BPSW plugs an interesting hole in the theory concerning pseudo-marginal methods (in the standard ABC setting with a positivity assumption at least), namely that for a fixed number of samples M there is at most a reduction by a factor of 2*M in the asymptotic variance of MCMC estimates for functions f for which this asymptotic variance is finite. This is a very interesting and novel relative quantitative bound. Furthermore, and adopting their notation, it implies that if var(f,Q_M) is finite for any M then var(f,Q_1) is also finite. So one cannot expand the class of functions for which the asymptotic variance is finite by increasing M! I do not think this was known, even if it may have been suspected.
Regarding the relationship to LL: we analyzed the one-hit Markov kernel proposed in the discussion of Fearnhead and Prangle’s read paper (by Christophe Andrieu, Arnaud Doucet and myself) and showed that it can have finite associated asymptotic variance in situations where no Q_M does (this situation is definitely not rare in the ABC context using local proposals on a non-compact state space, although the functions themselves may or may not be unusual). This does indeed contradict the suggestion that “one pseudo-sample is enough” but results from the fact that BPSW only compares Q_1 with Q_M and not with any other type of MCMC kernel for ABC like the one-hit kernel. In LL we also show in an example that the cost-adjusted asymptotic variance associated with the one-hit kernel can be smaller than that of any Q_M even when var(f,Q_M) is finite. This is not going to be true in general, but could possibly be true for some “well-behaved” classes of target distributions and functions f.
December 14, 2014 at 8:08 am
Thanks for this detailed comparison, Anthony! The central point seems to me in the existence of finite versus infinite variance settings, which amazed me the first time I saw it.