I was thinking of this: http://arxiv.org/pdf/1403.1345.pdf

But as I said, it’s a differnt problem to the one you’re considering.

]]>Dan, do you have a more precise reference for David Dunson’s paper? and a link?

]]>I think on of Dunson’s paper has asymptotics for a similar problem. It’s not quite the same (the thetas aren’t inferred), but by putting an appropriate dirichlet prior on the weights, they got optimal behaviour. I imagine that would work here too.

]]>Great, I should be there for most of that time; except when I’m at the BASP conference ( http://www.baspfrontiers.org ) at the very end of the month!

]]>Sounds most interesting! If you have time to discuss it with me when I am in Oxford, second half of January…?! I’d love to.

]]>Thanks, Florian. I completely agree that, if α has a posterior around 0.5, the main conclusion is that a mixture is the closest to the “true” model in the KL sense. I should rephrase this sentence…

And about the title: Testing sounded more generic and encompassing that Model choice or Model selection, I presume, so since we wanted to address the general problem it seemed more appropriate to use Testing… Of course, this is mostly a posteriori rationalisation.

]]>To built on the remark by Dan Simpson: if we do a standard model selection via BF or alike, we only compare the ability of two (or more) model structures to conform with the data. If we formulate the model selection via a mixture model, we are essentially offering a number of additional intermediate models, which could have very different properties in terms of distribution etc.

So, if I get an alpha = 0.5, I wonder how we can distinguish whether both models are equally likely given the data, or whether the mixture is a lot more likely that either of the two.

Side remark, but this is just semantics: I wondered why you used “testing hypotheses” and not “model selection” in the title.

]]>I’d actually suggest looking at the log of the marginal likelihood of ω = log(α/(1 – α)) (marginalized to be free of model-specific or moment-encoding parameters). You’ll need a prior on ω for the MCMC computation, but if you turn the resulting samples into a log-posterior-density estimate, you can just subtract off the log-prior (and, if you want, add back any other log-prior to get the correct log-posterior). The caveat is that ω and the model-specific parameters must be independent in the joint prior, but I believe that makes sense in this context…

This is analogous to how a Bayes factor can be computed by MCMC (with, say, the Carlin&Chib pseudo-prior approach, or RJMCMC) by picking a working prior over model probabilities and then taking the ratio of the posterior model probability to the prior model probability. And just as *that* working prior ought to be chosen for its computational properties, I would argue that MCMC in the estimation-as-model-testing setting should use a working prior chosen (or tuned on-line!) to provide good MCMC performance.

]]>