## read paper [in Bristol]

**I** went to give a seminar in Bristol last Friday and I chose to present the testing with mixture paper. As we are busy working on the revision, I was eagerly looking for comments and criticisms that could strengthen this new version. As it happened, the (Bristol) Bayesian Cake (Reading) Club had chosen our paper for discussion, two weeks in a row!, hence the title!, and I got invited to join the group the morning prior to the seminar! This was, of course, most enjoyable and relaxed, including an home-made cake!, but also quite helpful in assessing our arguments in the paper. One point of contention or at least of discussion was the common parametrisation between the components of the mixture. Although all parametrisations are equivalent from a *single* component point of view, I can [almost] see why using a mixture with the same parameter value on all components may impose some unsuspected constraint on that parameter. Even when the parameter is *the same moment* for both components. This still sounds like a minor counterpoint in that the weight should converge to either zero or one and hence eventually favour the posterior on the parameter corresponding to the “true” model.

Another point that was raised during the discussion is the behaviour of the method under misspecification or for an M-open framework: when neither model is correct does the weight still converge to the boundary associated with the closest model (as I believe) or does a convexity argument produce a non-zero weight as it limit (as hinted by one example in the paper)? I had thought very little about this and hence had just as little to argue though as this does not sound to me like the primary reason for conducting tests. Especially in a Bayesian framework. If one is uncertain about both models to be compared, one should have an alternative at the ready! Or use a non-parametric version, which is a direction we need to explore deeper before deciding it is coherent and convergent!

A third point of discussion was my argument that mixtures allow us to rely on the same parameter and hence the same prior, whether proper or not, while Bayes factors are less clearly open to this interpretation. This was not uniformly accepted!

Thinking afresh about this approach also led me to broaden my perspective on the use of the posterior distribution of the weight(s) α: while previously I had taken those weights mostly as a proxy to the posterior probabilities, to be calibrated by pseudo-data experiments, as for instance in Figure 9, I now perceive them primarily as the portion of the data in agreement with the corresponding model [or hypothesis] and more importantly as a solution for staying away from a Neyman-Pearson-like decision. Or error evaluation. Usually, when asked about the interpretation of the output, my answer is to compare the behaviour of the posterior on the weight(s) with a posterior associated with a sample from each model. Which does sound somewhat similar to posterior predictives if the samples are simulated from the associated predictives. But the issue was not raised during the visit to Bristol, which possibly reflects on how unfrequentist the audience was [the Statistics group is], as it apparently accepted with no further ado the use of a posterior distribution as a soft assessment of the comparative fits of the different models. If not necessarily agreeing the need of conducting hypothesis testing (especially in the case of the Pima Indian dataset!).

July 21, 2016 at 3:12 pm

Hello Professor,

When in the paper you say that the normalising constant is intractable for geometric mixtures, what do you mean exactly?

Best,

Luiz

July 21, 2016 at 6:37 pm

If I take a geometric mixture of two arbitrary densities, this function requires a normalising constant to be a probability density (i.e., to integrate to one). In most cases, the constant cannot be derived analytically, which is what I call an intractable constant.

July 21, 2016 at 6:43 pm

Pretty much what I thought. Thanks for clarifying. If the densities involved are from the exponential family, it’s possible to show that the resulting mixture will also be an exponential family density. For simple cases such as Gaussian, gamma and beta the resulting mixture will be in the same family. Do you think then that in these cases geometric mixtures would be worthy looking at?

Cheers,

Luiz

July 21, 2016 at 7:43 pm

’tis true that geometric mixtures of exponential family densities are again from exponential families. This does not always mean the result is a closed form normalising constant, except when both components are from the same exponential family. And in any case my point is more that, on a general basis, when considering testing and model choice, the arithmetic mixture is always manageable.

July 21, 2016 at 8:19 pm

Thanks for the response, much appreciated. I’ve been working with geometric mixtures (in the context of logarithmic pooling) and was wondering if we could use them in your framework. But you’re right: linear pools (aka arithmetic mixtures) are more tractable for model choice and testing. Besides, the interpretation of alpha seems to me to be more straightforward.

Best,

Luiz