## Bayesian composite likelihood

Posted in Books, Statistics, University life with tags , , , , , , on February 11, 2016 by xi'an

“…the pre-determined weights assigned to the different associations between observed and unobserved values represent strong a priori knowledge regarding the informativeness of clues. A poor choice of weights will inevitably result in a poor approximation to the “true” Bayesian posterior…”

Last Xmas, Alexis Roche arXived a paper on Bayesian inference via composite likelihood. I find the paper quite interesting in that [and only in that] it defends the innovative notion of writing a composite likelihood as a pool of opinions about some features of the data. Recall that each term in the composite likelihood is a marginal likelihood for some projection z=f(y) of the data y. As in ABC settings, although it is rare to derive closed-form expressions for those marginals. The composite likelihood is parameterised by powers of those components. Each component is associated with an expert, whose weight reflects the importance. The sum of the powers is constrained to be equal to one, even though I do not understand why the dimensions of the projections play no role in this constraint. Simplicity is advanced as an argument, which sounds rather weak… Even though this may be infeasible in any realistic problem, it would be more coherent to see the weights as producing the best Kullback approximation to the true posterior. Or to use a prior on the weights and estimate them along the parameter θ. The former could be incorporated into the later following the approach of Holmes & Walker (2013). While the ensuing discussion is most interesting, it remains missing in connecting the different components in terms of the (joint) information brought about the parameters. Especially because the weights are assumed to be given rather than inferred. Especially when they depend on θ. I also wonder why the variational Bayes interpretation is not exploited any further. And see no clear way to exploit this perspective in an ABC environment.

## Bruce Lindsay (March 7, 1947 — May 5, 2015)

Posted in Books, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on May 22, 2015 by xi'an

## ABC with composite score functions

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on December 12, 2013 by xi'an

My friends Erlis Ruli, Nicola Sartori and Laura Ventura from Università degli Studi de Padova have just arXived a new paper entitled Approximate Bayesian Computation with composite score functions. While the paper provides a survey of composite likelihood methods, the core idea of the paper is to use the score function (of the composite likelihood) as the summary statistic,

$\dfrac{\partial\,c\ell(\theta;y)}{\partial\,\theta},$

when evaluated at the maximum composite likelihood at the observed data point. In the specific (but unrealistic) case of an exponential family, an ABC based on the score is asymptotically (i.e., as the tolerance ε goes to zero) exact. The choice of the composite likelihood thus induces a natural summary statistics and, as in our empirical likelihood paper, where we also use the score of a composite likelihood, the composite likelihoods that are available for computation are usually quite a few, thus leading to an automated choice of a summary statistic..

An interesting (common) feature in most examples found in this paper is that comparisons are made between ABC using the (truly) sufficient statistic and ABC based on the pairwise score function, which essentially relies on the very same statistics. So the difference, when there is a difference, pertains to the choice of a different combination of the summary statistics or, somehow equivalently to the choice of a different distance function. One of the examples starts from our MA(2) toy-example in the 2012 survey in Statistics and Computing. The composite likelihood is then based on the consecutive triplet marginal densities. As shown by the picture below, the composite version improves to some extent upon the original ABC solution using three autocorrelations.

A suggestion I would have about a refinement of the proposed method deals with the distance utilised in the paper, namely the sum of the absolute differences between the statistics. Indeed, this sum is not scaled at all, neither for regular ABC nor for composite ABC, while the composite likelihood perspective provides in addition to the score a natural metric through the matrix A(θ) [defined on page 12]. So I would suggest comparing the performances of the methods using instead this rescaling since, in my opinion and in contrast with a remark on page 13, it is relevant in some (many?) settings where the amount of information brought by the composite model widely varies from one parameter to the next.

## Bayesian computation via empirical likelihood on line. Early.

Posted in Statistics, University life with tags , , , , , , , on January 16, 2013 by xi'an

Our paper on using empirical likelihood for Bayesian computation (with Kerrie Mengersen and Pierre Pudlo) has been accepted by PNAS [after we removed the A from ABCel!], which is terrific news! It has already appeared on-line as early edition in the issue of January 7. Which is also terrific! (Unfortunately, it is not open access, contrary to the previous PNAS paper on ABC model choice as the cost was just too high.)

## Bayesian computation with empirical likelihood and no A

Posted in Statistics, University life with tags , , , , , , , , on December 7, 2012 by xi'an

We just resubmitted our paper to PNAS about using empirical likelihood for conducting Bayesian computation. Although this is an approximation as well, we removed the A (for approximation) from the title and from the name of the method, BCel, to comply with a referee’s request and also account for several comments during our seminars that this was not ABC! We can see the point in those comments, namely that ABC is understood as a corpus of methods that rely on the simulation of pseudo-datasets to compensate for the missing likelihood, while empirical likelihood stands as another route bypassing this difficulty… I keep my fingers crossed that this ultimate revision is convincing enough for the PNAS board!

Coincidentally, Jean-Pierre Florens came to give a (Malinvaud) seminar at CREST today about semi-parametric Bayesian modelling, mixing Gaussian process priors with generalised moment conditions. This was a fairly involved talk with a lot of technical details about RKHS spaces and a mix of asymptotics and conjugate priors (somewhat empirical Bayesianish in spirit!) In a sense, it was puzzling because the unknown distribution was modelled conditional on an unknown parameter, θ, which itself was a function of this distribution. It was however quite interesting in that it managed to mix Gaussian process priors with some sort of empirical likelihood (or GMM). Furthermore, in a sort of antithesis to our approach with empirical likelihood, Florens and Simoni had a plethora of moment restrictions they called over-identification and used this feature to improve the estimation of the underlying density. There were also connections with Fukumizu et al. kernel Bayes’ rule perspective, even though I am not clear about the later. I also got lost here by the representation of the data as a point in an Hilbert space, thanks to a convolution step. (The examples involved orthogonal polynomials like Lagrange’s or Hermitte’s, which made sense as the data was back to a finite dimension!) Once again, the most puzzling thing is certainly  over-identification: in an empirical likelihood version, it would degrade the quality of the approximation by peaking more and more the approximation. It does not appear to cause such worries in Florens’ and Simoni’s perspective.

## workshop a Venezia (2)

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 10, 2012 by xi'an

I could only attend one day of the workshop on likelihood, approximate likelihood and nonparametric statistical techniques with some applications, and I wish I could have stayed a day longer (and definitely not only for the pleasure of being in Venezia!) Yesterday, Bruce Lindsay started the day with an extended review of composite likelihood, followed by recent applications of composite likelihood to clustering (I was completely unaware he had worked on the topic in the 80’s!). His talk was followed by several talks working on composite likelihood and other pseudo-likelihoods, which made me think about potential applications to ABC. During my tutorial talk on ABC, I got interesting questions on multiple testing and how to combine the different “optimal” summary statistics (answer: take all of them, it would not make sense to co;pare one pair with one summary statistic and another pair with another summary statistic), and on why we were using empirical likelihood rather than another pseudo-likelihood (answer: I do not have a definite answer. I guess it depends on the ease with which the pseudo-likelihood is derived and what we do with it. I would e.g. feel less confident to use the pairwise composite as a substitute likelihood rather than as the basis for a score function.) In the final afternoon, Monica Musio presented her joint work with Phil Dawid on score functions and their connection with pseudo-likelihood and estimating equations (another possible opening for ABC), mentioning a score family developped by Hyvärinen that involves the gradient of the square-root of a density, in the best James-Stein tradition! (Plus an approach bypassing the annoying missing normalising constant.) Then, based on a joint work with Nicola Satrori and Laura Ventura, Ruli Erlis exposed a 3rd-order tail approximation towards a (marginal) posterior simulation called HOTA. As Ruli will visit me in Paris in the coming weeks, I hope I can explore the possibilities of this method when he is (t)here. At last, Stéfano Cabras discussed higher-order approximations for Bayesian point-null hypotheses (jointly with Walter Racugno and Laura Ventura), mentioning the Pereira and Stern (so special) loss function mentioned in my post on Måns’ paper the very same day! It was thus a very informative and beneficial day for me, furthermore spent in a room overlooking the Canal Grande in the most superb location!