Another great day of talks and discussions at BIRS! Continuing on the themes of the workshop between delving into the further validation of those approximation techniques and the devising of ever more approximate solutions for ever more complex problems. Among the points that came clearer to me through discussion, a realisation that the synthetic likelihood perspective is not that far away from our assumptions in the consistency paper. And that a logistic version of the approach can be constructed as well. A notion I had not met before (or have forgotten I had met) is the one of early rejection ABC, which should actually be investigated more thoroughly as it should bring considerable improvement in computing time (with the caveats of calibrating the acceptance step before producing the learning sample and of characterising the output). Both Jukka Corander and Ewan Cameron reminded us of the case of models that take minutes or hours to produce one single dataset. (In his talk on some challenging applications, Jukka Corander chose to move from socks to boots!) And Jean-Michel Marin produced an illuminating if sobering experiment on the lack of proper Bayesian coverage by ABC solutions. (It appears that Ewan’s video includes a long empty moment when we went out for the traditional group photo, missing the end of his talk.)
Archive for ABC
The exact title of the paper by Jovana Metrovic, Dino Sejdinovic, and Yee Whye Teh is DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression. It appeared last year in the proceedings of ICML. The idea is to build ABC summaries by way of reproducing kernel Hilbert spaces (RKHS). Regressing such embeddings to the “optimal” choice of summary statistics by kernel ridge regression. With a possibility to derive summary statistics for quantities of interest rather than for the entire parameter vector. The use of RKHS reminds me of Arthur Gretton’s approach to ABC, although I see no mention made of that work in the current paper.
In the RKHS pseudo-linear formulation, the prediction of a parameter value given a sample attached to this value looks like a ridge estimator in classical linear estimation. (I thus wonder at why one would stop at the ridge stage instead of getting the full Bayes treatment!) Things get a bit more involved in the case of parameters (and observations) of interest, as the modelling requires two RKHS, because of the conditioning on the nuisance observations. Or rather three RHKS. Since those involve a maximum mean discrepancy between probability distributions, which define in turn a sort of intrinsic norm, I also wonder at a Wasserstein version of this approach.
What I find hard to understand in the paper is how a large-dimension large-size sample can be managed by such methods with no visible loss of information and no explosion of the computing budget. The authors mention Fourier features, which never rings a bell for me, but I wonder how this operates in a general setting, i.e., outside the iid case. The examples do not seem to go into enough details for me to understand how this massive dimension reduction operates (and they remain at a moderate level in terms of numbers of parameters). I was hoping Jovana Mitrovic could present her work here at the 17w5025 workshop but she sadly could not make it to Banff for lack of funding!
The ABC workshop I co-organised has now started and, despite a few last minutes cancellations, we have gathered a great crowd of researchers on the validation and expansion of ABC methods. Or ABC’ory to keep up with my naming of workshops. The videos of the talks should come up progressively on the BIRS webpage. When I did not forget to launch the recording. The program is quite open and with this size of workshop allows for talks and discussions to last longer than planned: the first days contain several expository talks on ABC convergence, auxiliary or synthetic models, summary constructions, challenging applications, dynamic models, and model assessment. Plus prepared discussions on those topics that hopefully involve several workshop participants. We had also set some time for snap-talks, to induce everyone to give a quick presentation of one’s on-going research and open problems. The first day was rather full but saw a lot of interactions and discussions during and around the talks, a mood I hope will last till Friday! Today in replacement of Richard Everitt who alas got sick just before the workshop, we are conducting a discussion on dimensional issues, part of which is made of parts of the following slides (mostly recycled from earlier talks, including the mini-course in Les Diablerets):
Today, I fly from Paris to Amsterdam to Calgary to attend the ABC’ory workshop (15w2214) at the Banff International Research Station (BIRS) that Luke Bornn, Jukka Corander, Gael Martin, Dennis Prangle, Richard Wilkinson and myself built. The meeting is to brainstorm about the foundations of ABC for statistical inference rather than about the computational aspects of ABC, but the schedule is quite flexible for other directions!
Since I could not download the slides of my ABC course in Les Diablerets in one go, I broke them by chapters as follows. (Warning: there is very little novelty in those slides, except for the final part on consistency.)
Although I did not do it on purpose (!), starting with indirect inference and other methods inspired from econometrics induced some discussion in the first hour of the course with econometricians in the room. Including Elvezio Ronchetti.
I also regretted piling too much material in the alphabet soup, as it was too widespread for a new audience. And as I could not keep the coherence of the earlier parts by going thru so many papers at once. Especially since I was a bit knackered after a day of skiing….
I managed to get to the final convergence chapter on the last day, even though I had to skip some of the earlier material. Which should be reorganised anyway as the parts between model choice with random forests and inference with random forests are not fully connected!
A recent question on X validated ended up being quite interesting! The model under consideration is made of parallel Markov chains on a finite state space, all with the same Markov transition matrix, M, which turns into a hidden Markov model when the only summary available is the number of chains in a given state at a given time. When writing down the EM algorithm, the E step involves the expected number of moves from a given state to a given state at a given time. The conditional distribution of those numbers of chains is a product of multinomials across times and starting states, with no Markov structure since the number of chains starting from a given state is known at each instant. Except that those multinomials are constrained by the number of “arrivals” in each state at the next instant and that this makes the computation of the expectation intractable, as far as I can see.
A solution by Monte Carlo EM means running the moves for each instant under the above constraints, which is thus a sort of multinomial distribution with fixed margins, enjoying a closed-form expression but for the normalising constant. The direct simulation soon gets too costly as the number of states increases and I thus considered a basic Metropolis move, using one margin (row or column) or the other as proposal, with the correction taken on another margin. This is very basic but apparently enough for the purpose of the exercise. If I find time in the coming days, I will try to look at the ABC resolution of this problem, a logical move when starting from non-sufficient statistics!