I’m thinking here of Johnson and Rossell’s non-local priors (J. R. Statist. Soc. B (2010), 72, Part 2, pp. 143–170). Their priors don’t change the consistency of Bayesian model selection, but do address a bizarre asymmetry that occurs when the prior under the alternative puts a lot of probability mass in the neighborhood of the null, which they term a “local prior”. The asymmetry is in the rates of accumulation of evidence for the true model when the null is true versus when the alternative is true. Under local priors evidence accumulates much more slowly for a true null, because the alternative looks a lot like it. Non-local priors move probability mass out of the neighborhood of the null, speeding up evidence accumulation for a true null.

]]>As maybe explained in more details in my initial draft of our letter, the analogy with the Venn diagram and the overlapping models does not apply to model comparison. In a Bayesian framework, when several models are under comparison, the model index becomes one (unknown) parameter *M* and the parameter(s) of the model is defined conditionally on the value of *M*. Since M cannot take two values simultaneously, two models cannot “co-exist”. This is also why considering parameters that are “common” to all models does not make sense (even though I may use this sentence from time to time in justifying the call to a “common” prior on a variance parameter say for all models). Another reason for using waterproof separation between models is that the purpose for selecting a model is to…select a model and hence work within this model once the decision is made. Having the properties of other (rejected) models interfering with the inference on the chosen model is not coherent. (Great title for the reading group!)