Jukka Corander sent me the announcement that he is opening a 3 year postdoctoral position at the University of Oslo, to work with him and his team on ABC projects. This sounds quite an exciting offer, plus gives the nominee the opportunity to live in the most enjoyable city of Oslo for several years in fairly comfy conditions! The deadline is May 31. (If I was at a stage of my career where applying made sense, I would definitely candidate. Not even waiting for the outcome of the French elections on May 7!)
Archive for ABC
A new arXiv entry on ways to approximate marginal likelihoods based on MCMC output, by astronomers (apparently). With an application to the 2015 Planck satellite analysis of cosmic microwave background radiation data, which reminded me of our joint work with the cosmologists of the Paris Institut d’Astrophysique ten years ago. In the literature review, the authors miss several surveys on the approximation of those marginals, including our San Antonio chapter, on Bayes factors approximations, but mention our ABC survey somewhat inappropriately since it is not advocating the use of ABC for such a purpose. (They mention as well variational Bayes approximations, INLA, powered likelihoods, if not nested sampling.)
The proposal of this paper is to identify the marginal m [actually denoted a there] as the normalising constant of an unnormalised posterior density. And to do so the authors estimate the posterior by a non-parametric approach, namely a k-nearest-neighbour estimate. With the additional twist of producing a sort of Bayesian posterior on the constant m. [And the unusual notion of number density, used for the unnormalised posterior.] The Bayesian estimation of m relies on a Poisson sampling assumption on the k-nearest neighbour distribution. (Sort of, since k is actually fixed, not random.)
If the above sounds confusing and imprecise it is because I am myself rather mystified by the whole approach and find it difficult to see the point in this alternative. The Bayesian numerics does not seem to have other purposes than producing a MAP estimate. And using a non-parametric density estimate opens a Pandora box of difficulties, the most obvious one being the curse of dimension(ality). This reminded me of the commented paper of Delyon and Portier where they achieve super-efficient convergence when using a kernel estimator, but with a considerable cost and a similar sensitivity to dimension.
For students, postdocs and faculty working on approximate inference, ABC algorithms, and likelihood-free methods, this announcement of two postdoc positions at Monash University, Melbourne, Australia, to work with Gael Martin, David Frazier and Catherine Forbes should be of strong relevance and particular interest:
The Department of Econometrics and Business Statistics at Monash is looking to fill two postdoc positions in – one for 12 months and the other for 2 years. The positions will be funded (respectively) by the following ARC Discovery grants:
1. DP150101728: “Approximate Bayesian Computation in State Space Models”. (Chief Investigators: Professor Gael Martin and Associate Professor Catherine Forbes; International Partner Investigators: Professor Brendan McCabe and Professor Christian Robert).
2. DP170100729: “The Validation of Approximate Bayesian Computation: Theory and Practice“. (Chief Investigators: Professor Gael Martin and Dr David Frazier; International Partner Investigators: Professor Christian Robert and Professor Eric Renault).The deadline for applications is April 28th, 2017, and the nominal starting date is July, 2017 (although there is some degree of flexibility on that front).
We consider Bayesian empirical likelihood estimation and develop an efficient Hamiltonian Monte Car lo method for sampling from the posterior distribution of the parameters of interest.The method proposed uses hitherto unknown properties of the gradient of the underlying log-empirical-likelihood function. We use results from convex analysis to show that these properties hold under minimal assumptions on the parameter space, prior density and the functions used in the estimating equations determining the empirical likelihood. Our method employs a finite number of estimating equations and observations but produces valid semi-parametric inference for a large class of statistical models including mixed effects models, generalized linear models and hierarchical Bayes models. We overcome major challenges posed by complex, non-convex boundaries of the support routinely observed for empirical likelihood which prevent efficient implementation of traditional Markov chain Monte Car lo methods like random-walk Metropolis–Hastings sampling etc. with or without parallel tempering. A simulation study confirms that our method converges quickly and draws samples from the posterior support efficiently. We further illustrate its utility through an analysis of a discrete data set in small area estimation.
[The comment is reposted from Series B’log, where I wrote it first.]
It is of particular interest for me [disclaimer: I was not involved in the review of this paper!] as we worked on ABC thru empirical likelihood, which is about the reverse of the current paper in terms of motivation: when faced with a complex model, we substitute an empirical likelihood version for the real thing, run simulations from the prior distribution and use the empirical likelihood as a proxy. With possible intricacies when the data is not iid (an issue we also met with Wasserstein distances.) In this paper the authors instead consider working on an empirical likelihood as their starting point and derive an HMC algorithm to do so. The idea is striking in that, by nature, an empirical likelihood is not a very smooth object and hence does not seem open to producing gradients and Hessians. As illustrated by Figure 1 in the paper . Which is so spiky at places that one may wonder at the representativity of such graphs.
I have always had a persistent worry about the ultimate validity of treating the empirical likelihood as a genuine likelihood, from the fact that it is the result of an optimisation problem to the issue that the approximate empirical distribution has a finite (data-dependent) support, hence is completely orthogonal to the true distribution. And to the one that the likelihood function is zero outside the convex hull of the defining equations…(For one thing, this empirical likelihood is always bounded by one but this may be irrelevant after all!)
The computational difficulty in handling the empirical likelihood starts with its support. Eliminating values of the parameter for which this empirical likelihood is zero amounts to checking whether zero belongs to the above convex hull. A hard (NP hard?) problem. (Although I do not understand why the authors dismiss the token observations of Owen and others. The argument that Bayesian analysis does more than maximising a likelihood seems to confuse the empirical likelihood as a product of a maximisation step with the empirical likelihood as a function of the parameter that can be used as any other function.)
In the simple regression example (pp.297-299), I find the choice of the moment constraints puzzling, in that they address the mean of the white noise (zero) and the covariance with the regressors (zero too). Puzzling because my definition of the regression model is conditional on the regressors and hence does not imply anything on their distribution. In a sense this is another model. But I also note that the approach focus on the distribution of the reconstituted white noises, as we did in the PNAS paper. (The three examples processed in the paper are all simple and could be processed by regular MCMC, thus making the preliminary step of calling for an empirical likelihood somewhat artificial unless I missed the motivation. The paper also does not seem to discuss the impact of the choice of the moment constraints or the computing constraints involved by a function that is itself the result of a maximisation problem.)
A significant part of the paper is dedicated to the optimisation problem and the exclusion of the points on the boundary. Which sounds like a non-problem in continuous settings. However, this appears to be of importance for running an HMC as it cannot evade the support (without token observations). On principle, HMC should not leave this support since the gradient diverges at the boundary, but in practice the leapfrog approximation may lead the path outside. I would have (naïvely?) suggested to reject moves when this happens and start again but the authors consider that proper choices of the calibration factors of HMC can avoid this problem. Which seems to induce a practical issue by turning the algorithm into an adaptive version.
As a last point, I would have enjoyed seeing a comparison of the performances against our (A)BCel version, which would have been straightforward to implement in the simple examples handled by the paper. (This could be a neat undergraduate project for next year!)
Jonathan Harrison and Ruth Baker (Oxford University) arXived this morning a paper on the optimal combination of summaries for ABC in the sense of deriving the proper weights in an Euclidean distance involving all the available summaries. The idea is to find the weights that lead to the maximal distance between prior and posterior, in a way reminiscent of Bernardo’s (1979) maximal information principle. Plus a sparsity penalty à la Lasso. The associated algorithm is sequential in that the weights are updated at each iteration. The paper does not get into theoretical justifications but considers instead several examples with limited numbers of both parameters and summary statistics. Which may highlight the limitations of the approach in that handling (and eliminating) a large number of parameters may prove impossible this way, when compared with optimisation methods like random forests. Or summary-free distances between empirical distributions like the Wasserstein distance.