## Bayesian composite likelihood

“…the pre-determined weights assigned to the different associations between observed and unobserved values represent strong a priori knowledge regarding the informativeness of clues. A poor choice of weights will inevitably result in a poor approximation to the “true” Bayesian posterior…”

**L**ast Xmas, Alexis Roche arXived a paper on Bayesian inference via composite likelihood. I find the paper quite interesting in that [and only in that] it defends the innovative notion of writing a composite likelihood as a pool of opinions about some features of the data. Recall that each term in the composite likelihood is a marginal likelihood for some projection z=f(y) of the data y. As in ABC settings, although it is rare to derive closed-form expressions for those marginals. The composite likelihood is parameterised by powers of those components. Each component is associated with an expert, whose weight reflects the importance. The sum of the powers is constrained to be equal to one, even though I do not understand why the dimensions of the projections play no role in this constraint. Simplicity is advanced as an argument, which sounds rather weak… Even though this may be infeasible in any realistic problem, it would be more coherent to see the weights as producing the best Kullback approximation to the true posterior. Or to use a prior on the weights and estimate them along the parameter θ. The former could be incorporated into the later following the approach of Holmes & Walker (2013). While the ensuing discussion is most interesting, it remains missing in connecting the different components in terms of the (joint) information brought about the parameters. Especially because the weights are assumed to be given rather than inferred. Especially when they depend on θ. I also wonder why the variational Bayes interpretation is not exploited any further. And see no clear way to exploit this perspective in an ABC environment.

September 14, 2016 at 2:36 pm

Dear Xi’An,

Thank you for your comment about my working paper, which I just came across today. I thought I would let you know that there is an update of this work (http://arxiv.org/abs/1512.07678) that discusses an idea similar, although not quite exactly identical, to your suggestion of optimizing the weights against the KL divergence from the “true” posterior, see Section 5. Any feedback is warmly welcome.

With best regards,

Alexis