## Savage-Dickey supermodels

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on September 13, 2016 by xi'an

A. Mootoovaloo, B. Bassett, and M. Kunz just arXived a paper on the computation of Bayes factors by the Savage-Dickey representation through a supermodel (or encompassing model). (I wonder why Savage-Dickey is so popular in astronomy and cosmology statistical papers and not so much elsewhere.) Recall that the trick is to write the Bayes factor in favour of the encompasssing model as the ratio of the posterior and of the prior for the tested parameter (thus eliminating nuisance or common parameters) at its null value,

B10=π(φ⁰|x)/π(φ⁰).

Modulo some continuity constraints on the prior density, and the assumption that the conditional prior on nuisance parameter is the same under the null model and the encompassing model [given the null value φ⁰]. If this sounds confusing or even shocking from a mathematical perspective, check the numerous previous entries on this topic on the ‘Og!

The supermodel created by the authors is a mixture of the original models, as in our paper, and… hold the presses!, it is a mixture of the likelihood functions, as in Phil O’Neill’s and Theodore Kypraios’ paper. Which is not mentioned in the current paper and should obviously be. In the current representation, the posterior distribution on the mixture weight α is a linear function of α involving both evidences, α(m¹-m²)+m², times the artificial prior on α. The resulting estimator of the Bayes factor thus shares features with bridge sampling, reversible jump, and the importance sampling version of nested sampling we developed in our Biometrika paper. In addition to O’Neill and Kypraios’s solution.

The following quote is inaccurate since the MCMC algorithm needs simulating the parameters of the compared models in realistic settings, hence representing the multidimensional integrals by Monte Carlo versions.

“Though we have a clever way of avoiding multidimensional integrals to calculate the Bayesian Evidence, this new method requires very efficient sampling and for a small number of dimensions is not faster than individual nested sampling runs.”

I actually wonder at the sheer rationale of running an intensive MCMC sampler in such a setting, when the weight α is completely artificial. It is only used to jump from one model to the next, which sound quite inefficient when compared with simulating from both models separately and independently. This approach can also be seen as a special case of Carlin’s and Chib’s (1995) alternative to reversible jump. Using instead the Savage-Dickey representation is of course infeasible. Which makes the overall reference to this method rather inappropriate in my opinion. Further, the examples processed in the paper all involve (natural) embedded models where the original Savage-Dickey approach applies. Creating an additional model to apply a pseudo-Savage-Dickey representation does not sound very compelling…

Incidentally, the paper also includes a discussion of a weird notion, the likelihood of the Bayes factor, B¹², which is plotted as a distribution in B¹², most strangely. The only other place I met this notion is in Murray Aitkin’s book. Something’s unclear there or in my head!

“One of the fundamental choices when using the supermodel approach is how to deal with common parameters to the two models.”

This is an interesting question, although maybe not so relevant for the Bayes factor issue where it should not matter. However, as in our paper, multiplying the number of parameters in the encompassing model may hinder convergence of the MCMC chain or reduce the precision of the approximation of the Bayes factor. Again, from a Bayes factor perspective, this does not matter [while it does in our perspective].

## ABC model choice not to be trusted [2]

Posted in R, Statistics with tags , , , on January 28, 2011 by xi'an

As we were completing our arXiv summary about ABC model choice, we were helpfully pointed to a recent CRiSM tech. report by X. Didelot, R. Everitt, A. Johansen and D. Lawson on  Likelihood-free estimation of model evidence. This paper is quite related to our study of the performances of the ABC approximation to the Bayes factor, deriving in particular the limiting behaviour for the ratio,

$B_{12}(x) = \dfrac{g_1(x)}{g_2(x)}\,B^S_{12}(x).$

However, Didelot et al. reach the opposite conclusion from ours, namely that the problem can be solved by a sufficiency argument. Their point is that, when comparing models within exponential families (which is the natural realm for sufficient statistics), it is always possible to build an encompassing model with a sufficient statistic that remains sufficient across models. This construction of Didelot et al. is correct from a mathematical perspective, as seen for instance in the Poisson versus geometric example we first mentioned in Grelaud et al. (2009): adding

$\prod_{i=1}^n x_i!$

to the sum of the observables into a large sufficient statistic produces a ratio g1/g2 that is equal to 1.

Nonetheless, we do not think this encompassing property has a direct impact on the performances of ABC model choice. In practice, complex models do not enjoy sufficient statistics (if only because the overwhelming majority of them are not exponential families, with the notable exception of Gibbs random fields where the above agreement graph is derived). There is therefore a strict loss of information in using ABC model choice, due to the call both to insufficient statistics and to non-zero tolerances. Looking at what happens in the limiting case when one is relying on a common sufficient statistic is a formal study that brings light on the potentially huge discrepancy between the ABC-based Bayes factor and the true Bayes factor. This is why we consider that finding a solution in this formal case—while a valuable extension of the Gibbs random fields case—does not directly help towards the understanding of the discrepancy found in non-exponential complex models.