## the demise of the Bayes factor

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , on December 8, 2014 by xi'an

With Kaniav Kamary, Kerrie Mengersen, and Judith Rousseau, we have just arXived (and submitted) a paper entitled “Testing hypotheses via a mixture model”. (We actually presented some earlier version of this work in Cancũn, Vienna, and Gainesville, so you may have heard of it already.) The notion we advocate in this paper is to replace the posterior probability of a model or an hypothesis with the posterior distribution of the weights of a mixture of the models under comparison. That is, given two models under comparison,

$\mathfrak{M}_1:x\sim f_1(x|\theta_1) \text{ versus } \mathfrak{M}_2:x\sim f_2(x|\theta_2)$

we propose to estimate the (artificial) mixture model

$\mathfrak{M}_{\alpha}:x\sim\alpha f_1(x|\theta_1) + (1-\alpha) f_2(x|\theta_2)$

and in particular derive the posterior distribution of α. One may object that the mixture model is neither of the two models under comparison but this is the case at the boundary, i.e., when α=0,1. Thus, if we use prior distributions on α that favour the neighbourhoods of 0 and 1, we should be able to see the posterior concentrate near 0 or 1, depending on which model is true. And indeed this is the case: for any given Beta prior on α, we observe a higher and higher concentration at the right boundary as the sample size increases. And establish a convergence result to this effect. Furthermore, the mixture approach offers numerous advantages, among which [verbatim from the paper]:

## Dear Sir, I am unable to understand…

Posted in Statistics, University life with tags , , , , , , on January 30, 2013 by xi'an

Here is an email I received a few days ago, similar to many other emails I/we receive on a regular basis:

I am working on Markov Chain Monte Carlo methods as part of my Masters project. I have to estimate mean, variance from a Gaussian mixture using metropolis method.  I came across your paper ‘Bayesian Modelling and Inference on Mixtures of Distributions’. I am unable to understand how to obtain the new sample for mean, variance etc… I am using uniform distribution as proposal distribution. Should it be random numbers for the proposal distribution.
I have been working and trying to understand this for a long time. I would be grateful for any help.

While I felt sorry for the Master student, I consider it is the responsibility of his/her advisor to give her/him the proper directions for understanding the paper. (Given the contents of the email, it sounds as if the student would require proper training in both Bayesian statistics [uniform priors on unbounded parameters?] and simulation [the question about random numbers does not make sense]…) This is what I replied to the student, hopefully in a positive tone.

## Reference prior for logistic regression

Posted in Statistics with tags , , on January 14, 2009 by xi'an

Gelman et al. just published a paper in the Annals of Applied Statistics on the selection of a prior on the parameters of a logistic regression. The idea is to scale the prior in terms of the impact of a “typical” change in a covariate onto the probability function, which is reasonable as long as there is enough independence between those covariates. The covariates are primarily rescaled to all have the same expected range, which amounts to me to a kind of empirical Bayes estimation of the scales in an unormalised problem. The parameters are then associated with independent Cauchy (or t) priors, whose scale s is chosen as 2.5 in order to make the ±5 logistic range the extremal value. The perspective is well-motivated within the paper, and supported in addition by the availability of an R package called bayesglm.

This being said, I would have liked to see a comparison of bayesglm. with the generalised g-prior perspective we develop in Bayesian Core rather than with the flat prior, which is not the correct Jeffreys’ prior and which anyway does not always lead to a proper prior. In fact, the independent prior seems too rudimentary in the case of many (inevitably correlated) covariates, with the scale of 2.5 being then too large even when brought back to a reasonable change in the covariate. On the other hand, starting with a g-like-prior on the parameters and using a non-informative prior on the factor g allows for both a natural data-based scaling and an accounting of the dependence between the covariates. This non-informative prior on g then amounts to a generalised t prior on the parameter, once g is integrated. Anyone interested in the comparison can use the functions provided here on the webpage of Bayesian Core. (The paper already includes a comparison with Jeffreys’ prior implemented as brglm and the BBR algorithm of Genkins et al. (2007).) In the revision of Bayesian Core, we will most likely draw this comparison.