## estimating the marginal likelihood (or an information criterion)

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on December 28, 2019 by xi'an

Tory Imai (from Kyoto University) arXived a paper last summer on what first looked like a novel approximation of the marginal likelihood. Based on the variance of thermodynamic integration. The starting argument is that there exists a power 0<t⁰<1 such that the expectation of the logarithm of the product of the prior by the likelihood to the power t⁰ or t⁰-powered likelihood  is equal to the standard log-marginal

$\log m(x) = \mathbb{E}^{t^0}[ \log f(X|\theta) ]$

when the expectation is under the posterior corresponding to the t⁰-powered likelihood (rather than the full likelihood). By an application of the mean value theorem. Watanabe’s (2013) WBIC replaces the optimum t⁰ with 1/log(n), n being the sample size. The issue in terms of computational statistics is of course that the error of WBIC (against the true log m(x)) is only characterised as an order of n.

The second part of the paper is rather obscure to me, as the motivation for the real log canonical threshold is missing, even though the quantity is connected with the power likelihood. And the DIC effective dimension. It then goes on to propose a new approximation of sBIC, where s stands for singular, of Drton and Plummer (2017) which I had missed (and may ask my colleague Martin later today at Warwick!). Quickly reading through the later however brings explanations about the real log canonical threshold being simply the effective dimension in Schwarwz’s BIC approximation to the log marginal,

$\log m(x) \approx= \log f(x|\hat{\theta}_n) - \lambda \log n +(m-1)\log\log n$

(as derived by Watanabe), where m is called the multiplicity of the real log canonical threshold. Both λ and m being unknown, Drton and Plummer (2017) estimate the above approximation in a Bayesian fashion, which leads to a double indexed marginal approximation for a collection of models. Since this thread leads me further and further from a numerical resolution of the marginal estimation, but brings in a different perspective on mixture Bayesian estimation, I will return to this highly  in a later post. The paper of Imai discusses a different numerical approximation to sBIC, With a potential improvement in computing sBIC. (The paper was proposed as a poster to BayesComp 2020, so I am looking forward discussing it with the author.)

## 19 dubious ways to compute the marginal likelihood

Posted in Books, Statistics with tags , , , , , , , , , , on December 11, 2018 by xi'an

A recent arXival on nineteen different [and not necessarily dubious!] ways to approximate the marginal likelihood of a given topology of a philogeny tree that reminded me of our San Antonio survey with Jean-Michel Marin. This includes a version of the Laplace approximation called Laplus (!), accounting for the fact that branch lengths on the tree are positive but may have a MAP at zero. Using a Beta, Gamma, or log-Normal distribution instead of a Normal. For importance sampling, the proposals are derived from either the Laplus (!) approximate distributions or from the variational Bayes solution (based on an Normal product). Harmonic means are still used here despite the obvious danger, along with a defensive version that mixes prior and posterior. Naïve Monte Carlo means simulating from the prior, while bridge sampling seems to use samples from prior and posterior distributions. Path and modified path sampling versions are those proposed in 2008 by Nial Friel and Tony Pettitt (QUT). Stepping stone sampling appears like another version of path sampling, also based on a telescopic product of ratios of normalising constants, the generalised version relying on a normalising reference distribution that need be calibrated. CPO and PPD in the above table are two versions based on posterior predictive density estimates.

When running the comparison between so many contenders, the ground truth is selected as the values returned by MrBayes in a massive MCMC experiment amounting to 7.5 billions generations. For five different datasets. The above picture describes mean square errors for the probabilities of split, over ten replicates [when meaningful], the worst case being naïve Monte Carlo, with nested sampling and harmonic mean solutions close by. Similar assessments proceed from a comparison of Kullback-Leibler divergences. With the (predicatble?) note that “the methods do a better job approximating the marginal likelihood of more probable trees than less probable trees”. And massive variability for the poorest methods:

The comparison above does not account for time and since some methods are deterministic (and fast) there is little to do about this. The stepping steps solutions are very costly, while on the middle range bridge sampling outdoes path sampling. The assessment of nested sampling found in the conclusion is that it “would appear to be an unwise choice for estimating the marginal likelihoods of topologies, as it produces poor approximate posteriors” (p.12). Concluding at the Gamma Laplus approximation being the winner across all categories! (There is no ABC solution studied in this paper as the model likelihood can be computed in this setup, contrary to our own setting.)