Archive for Banff
And another exciting and animated [last] day of ABC’ory [and practice]! Kyle Cranmer exposed a density ratio density estimation approach I had not seen before [and will comment here soon]. Patrick Muchmore talked about unbiased estimators of Gaussian and non-Gaussian densities in elliptically contoured distributions which allows for running pseudo-MCMC than ABC. This reminded me of using the same tool [for those distributions can be expressed as mixtures of normals] in my PhD thesis, if for completely different purposes. In his talk, including a presentation of an ABC blackbox platform called ELFI, Samuel Kaski did translate statistical inference as inverse reinforcement learning: I hope this does not catch! In the afternoon, Dennis Prangle gave us the intuition behind his rare event ABC, which is not estimating rare events by ABC but rather using rare event simulation to improve ABC. [A paper I will a.s. comment here soon as well!] And Scott Sisson concluded the day and the week with his views on ABC for high dimensions.
While being obviously biased as the organiser of the workshop, I nonetheless feel it was a wonderful meeting with just the right number of participants to induce interactions and discussions during and around the talk, as well as preserve some time for pairwise interactions. Like all other workshops I contributed to in BIRS along the years
|07w5079||2007-07-01||Bioinformatics, Genetics and Stochastic Computation: Bridging the Gap|
|10w2170||2010-09-10||Hierarchical Bayesian Methods in Ecology|
|14w5125||2014-03-02||Advances in Scalable Bayesian Computation|
this is certainly a highly profitable one! For a [major] change, the next one [18w5023] will take place in Oaxaca, Mexico, and will see computational statistics meet molecular simulation. [As an aside, here are the first and last slides of Ewan Cameron’s talk, appropriately illustrating beginning and end, for both themes of his talk: epidemiology and astronomy!]
Another great day of talks and discussions at BIRS! Continuing on the themes of the workshop between delving into the further validation of those approximation techniques and the devising of ever more approximate solutions for ever more complex problems. Among the points that came clearer to me through discussion, a realisation that the synthetic likelihood perspective is not that far away from our assumptions in the consistency paper. And that a logistic version of the approach can be constructed as well. A notion I had not met before (or have forgotten I had met) is the one of early rejection ABC, which should actually be investigated more thoroughly as it should bring considerable improvement in computing time (with the caveats of calibrating the acceptance step before producing the learning sample and of characterising the output). Both Jukka Corander and Ewan Cameron reminded us of the case of models that take minutes or hours to produce one single dataset. (In his talk on some challenging applications, Jukka Corander chose to move from socks to boots!) And Jean-Michel Marin produced an illuminating if sobering experiment on the lack of proper Bayesian coverage by ABC solutions. (It appears that Ewan’s video includes a long empty moment when we went out for the traditional group photo, missing the end of his talk.)
The exact title of the paper by Jovana Metrovic, Dino Sejdinovic, and Yee Whye Teh is DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression. It appeared last year in the proceedings of ICML. The idea is to build ABC summaries by way of reproducing kernel Hilbert spaces (RKHS). Regressing such embeddings to the “optimal” choice of summary statistics by kernel ridge regression. With a possibility to derive summary statistics for quantities of interest rather than for the entire parameter vector. The use of RKHS reminds me of Arthur Gretton’s approach to ABC, although I see no mention made of that work in the current paper.
In the RKHS pseudo-linear formulation, the prediction of a parameter value given a sample attached to this value looks like a ridge estimator in classical linear estimation. (I thus wonder at why one would stop at the ridge stage instead of getting the full Bayes treatment!) Things get a bit more involved in the case of parameters (and observations) of interest, as the modelling requires two RKHS, because of the conditioning on the nuisance observations. Or rather three RHKS. Since those involve a maximum mean discrepancy between probability distributions, which define in turn a sort of intrinsic norm, I also wonder at a Wasserstein version of this approach.
What I find hard to understand in the paper is how a large-dimension large-size sample can be managed by such methods with no visible loss of information and no explosion of the computing budget. The authors mention Fourier features, which never rings a bell for me, but I wonder how this operates in a general setting, i.e., outside the iid case. The examples do not seem to go into enough details for me to understand how this massive dimension reduction operates (and they remain at a moderate level in terms of numbers of parameters). I was hoping Jovana Mitrovic could present her work here at the 17w5025 workshop but she sadly could not make it to Banff for lack of funding!