Our paper on using empirical likelihood for Bayesian computation (with Kerrie Mengersen and Pierre Pudlo) has been accepted by PNAS [after we removed the A from ABCel!], which is terrific news! It has already appeared on-line as early edition in the issue of January 7. Which is also terrific! (Unfortunately, it is not open access, contrary to the previous PNAS paper on ABC model choice as the cost was just too high.)
Archive for PNAS
We just resubmitted our paper to PNAS about using empirical likelihood for conducting Bayesian computation. Although this is an approximation as well, we removed the A (for approximation) from the title and from the name of the method, BCel, to comply with a referee’s request and also account for several comments during our seminars that this was not ABC! We can see the point in those comments, namely that ABC is understood as a corpus of methods that rely on the simulation of pseudo-datasets to compensate for the missing likelihood, while empirical likelihood stands as another route bypassing this difficulty… I keep my fingers crossed that this ultimate revision is convincing enough for the PNAS board!
Coincidentally, Jean-Pierre Florens came to give a (Malinvaud) seminar at CREST today about semi-parametric Bayesian modelling, mixing Gaussian process priors with generalised moment conditions. This was a fairly involved talk with a lot of technical details about RKHS spaces and a mix of asymptotics and conjugate priors (somewhat empirical Bayesianish in spirit!) In a sense, it was puzzling because the unknown distribution was modelled conditional on an unknown parameter, θ, which itself was a function of this distribution. It was however quite interesting in that it managed to mix Gaussian process priors with some sort of empirical likelihood (or GMM). Furthermore, in a sort of antithesis to our approach with empirical likelihood, Florens and Simoni had a plethora of moment restrictions they called over-identification and used this feature to improve the estimation of the underlying density. There were also connections with Fukumizu et al. kernel Bayes’ rule perspective, even though I am not clear about the later. I also got lost here by the representation of the data as a point in an Hilbert space, thanks to a convolution step. (The examples involved orthogonal polynomials like Lagrange’s or Hermitte’s, which made sense as the data was back to a finite dimension!) Once again, the most puzzling thing is certainly over-identification: in an empirical likelihood version, it would degrade the quality of the approximation by peaking more and more the approximation. It does not appear to cause such worries in Florens’ and Simoni’s perspective.
As mentioned in my review of Paradoxes in Scientific Inference I was a bit confused by this presentation of the likelihood principle and this led me to ponder for a week or so whether or not there was an issue with Birnbaum’s proof (or, much more likely, with my vision of it!). After reading again Birnbaum’s proof, while sitting down in a quiet room at ICERM for a little while, I do not see any reason to doubt it. (Keep reading at your own risk!)
My confusion was caused by mixing sufficiency in the sense of Birnbaum’s mixed experiment with sufficiency in the sense of our ABC model choice PNAS paper, namely that sufficient statistics are not always sufficient to select the right model. The sufficient statistics in the proof reduces the (2,x2) observation from Model 2 to (1,x1) from Model 1 when there is an observation x1 that produces a likelihood proportional to the likelihood for x2 and the statistic is indeed sufficient: the distribution of (2,x2) given (1,x1) does not depend on the parameter θ. Of course, the statistic is not sufficient (most of the time) for deciding between Model 1 and Model 2, but this model choice issue is foreign to Birnbaum’s construction.
While there are already several books about BUGS and WinBUGS on the market, e.g. the one by Ioannis Ntzoufras I reviewed a while ago, I was quite pleased to discover in the mail today that CRC Press had sent me a copy of The BUGS Book, written by no-one else but the parents of the BUGS software themselves! As I was not aware the book had been published a month or so ago… Before anyone send me an email or a comment requesting the book for review in CHANCE, it has already been sent to a knowledgeable BUGSpert to whose detailed analysis I am looking forward. (I still had a quick look at the book and noticed a reference to our PNAS paper on ABC model choice, yay!)
We (Kerrie Mengersen, Pierre Pudlo, and myself) have now revised our ABC with empirical likelihood paper and resubmitted both to arXiv and to PNAS as “Approximate Bayesian computation via empirical likelihood“. The main issue raised by the referees was that the potential use of the empirical likelihood (EL) approximation is much less widespread than the possibility of simulating pseudo-data, because EL essentially relies on an iid sample structure, plus the availability of parameter defining moments. This is indeed the case to some extent and also the reason why we used a compound likelihood for our population genetic model. There are in fact many instances where we simply cannot come up with a regular EL approximation… However, the range of applications of straight EL remains wide enough to be of interest, as it includes most dynamical models like hidden Markov models. To illustrate this point further, we added (in this revision) an example borrowed from the recent Biometrika paper by David Cox and Christiana Kartsonaki (which proposes a frequentist alternative to ABC based on fractional design). This model ended up being fairly appealing wrt our perspective: while the observed data is dependent in a convoluted way, being a superposition of N renewal processes with gamma waiting times, it is possible to recover an iid structure at the same cost as a regular ABC algorithm by using the pseudo-data to recover an iid process (the sequence of renewal processes indicators)…The outcome is quite favourable to ABCel in this particular case, as shown by the graph below (top: ABCel, bottom: ABC, red line:truth):
This revision (started while visiting Kerrie in Brisbane) was thus quite beneficial to our perception of ABC in that (a) it is indeed not as universal as regular ABC and this restriction should be spelled out (the advantage being that, when it can be implemented, it usually runs much much faster!), and (b) in cases where the pseudo-data must be simulated, EL provides a reference/benchmark for the ABC output that comes for free… Now I hope to manage to get soon out of the “initial quality check” barrage to reach the Editorial Board!