## Bayesian computation with empirical likelihood and no A

Posted in Statistics, University life with tags , , , , , , , , on December 7, 2012 by xi'an

We just resubmitted our paper to PNAS about using empirical likelihood for conducting Bayesian computation. Although this is an approximation as well, we removed the A (for approximation) from the title and from the name of the method, BCel, to comply with a referee’s request and also account for several comments during our seminars that this was not ABC! We can see the point in those comments, namely that ABC is understood as a corpus of methods that rely on the simulation of pseudo-datasets to compensate for the missing likelihood, while empirical likelihood stands as another route bypassing this difficulty… I keep my fingers crossed that this ultimate revision is convincing enough for the PNAS board!

Coincidentally, Jean-Pierre Florens came to give a (Malinvaud) seminar at CREST today about semi-parametric Bayesian modelling, mixing Gaussian process priors with generalised moment conditions. This was a fairly involved talk with a lot of technical details about RKHS spaces and a mix of asymptotics and conjugate priors (somewhat empirical Bayesianish in spirit!) In a sense, it was puzzling because the unknown distribution was modelled conditional on an unknown parameter, θ, which itself was a function of this distribution. It was however quite interesting in that it managed to mix Gaussian process priors with some sort of empirical likelihood (or GMM). Furthermore, in a sort of antithesis to our approach with empirical likelihood, Florens and Simoni had a plethora of moment restrictions they called over-identification and used this feature to improve the estimation of the underlying density. There were also connections with Fukumizu et al. kernel Bayes’ rule perspective, even though I am not clear about the later. I also got lost here by the representation of the data as a point in an Hilbert space, thanks to a convolution step. (The examples involved orthogonal polynomials like Lagrange’s or Hermitte’s, which made sense as the data was back to a finite dimension!) Once again, the most puzzling thing is certainly  over-identification: in an empirical likelihood version, it would degrade the quality of the approximation by peaking more and more the approximation. It does not appear to cause such worries in Florens’ and Simoni’s perspective.

## kernel approximate Bayesian computation for population genetic inferences

Posted in Statistics, University life with tags , , , , on May 22, 2012 by xi'an

A new posting about ABC on arXiv by Shigeki Nakagome, Kenji Fukumizu, and Shuhei Mano entitled kernel approximate Bayesian computation for population genetic inferences argues about an improvement brought by the use of reproducing kernel Hilbert space (RKHS) perspective in ABC methodology, when compared with more standard ABC relying on a rather arbitrary choice of summary statistics and metric. However, I feel that the paper does not substantially defend this point, only using a simulation experiment to compare mean square errors. In particular, the claim of consistency is unsubstantiated, as is the counterpoint that “conventional ABC did not have consistency” (page 14) [and several papers, including the just published Fearnhead and Prangle, claim the opposite]. Furthermore, a considerable amount of space is taken in the paper by the description of the existing ABC algorithms, while the complete version of the new kernel ABC-RKHS algorithm is missing. In particular, the coverage of kernel Bayes is too sketchy to be comprehensible [at least to me] without additional study. Actually, I do not get the notion of kernel Bayes’ rule, which seems defined only in terms of expectations

$\mathbb{E}[f(\theta)|s]=\sum_i w_i f(\theta_i),$

where the weights are the ridge-like matrix

$w_i=\sum_j (\mathbf{G}_S + n\epsilon_n \mathbf{I}_n)^{-1}_{ij}k(s_i,s_j)$

where the parameter is generated from the prior, the data s is generated from the sampling distribution, and the matrix GS is made of the k(si,sj)‘s. The surrounding Hilbert space presentation does not seem particularly relevant, esp. in population genetics… I am also under the impression that the choice of the kernel function k(.,.) is as important as the choice of the metric in regular ABC, although this is not discussed in the paper, since it implies [among other things] the choice of a metric. The implementation uses a Gaussian kernel and an Euclidean metric, which involves assumptions on the homogeneous nature of the components of the summary statistics or of the data. Similarly, the “regularization” parameter εn needs to be calibrated and the paper is unclear about this, apparently picking the parameter that “showed the smallest MSEs” (page 10), which cannot be called a calibration. (There is a rather unimportant proposition about concentration of information on page 6 which proof relies on two densities being ordered, see top of page 7.)