thanks very much for the answer. I was asking about the horseshoe because a part of the Biometrika paper deals with thresholding, and conditional on the threshold dropping some variables. I was feeling that using the horseshoe prior and reporting posteriors from this model is enough. That is, we don’t need to re-run a model with only the predictors that were selected before, since that would be using the data twice (but given my experience so far with the publication process in the field of ecology, this is probably what will be asked by reviewers).

Hence my question about the horseshoe. I feel that it could fall also in your category (e) in practice but really is fine on its own (d).

Or in Tolkienian prose, the horseshoe may be then some sort of “One model to find them all and in the prior shrink them”…

Sorry for bugging you with probably naïve questions and mille mercis d’avoir pris le temps de répondre.

Cheers

Matthieu

(a) “using the data twice” is not related to Bayesian model selection per se in that we can run proper Bayesian model selection w/o using the data twice. It is also possible to “use the data twice” in non-Bayesian settings.

(a’) by the way variable selection

(b) Lasso is a penalised likelihood technique for variable comparison / nested model comparison. It has a vague Bayesian flavour that was precised by Park and Casella (2008).

(c) AIC, BIC, &C, Lasso, are on principle free from the sin of “using the data twice”, being all of the likelihood ratio type.

(c’) this is less clear for DIC!

(d) Carvahlo, Scott and Polson have reanalysed the Bayesian lasso by picking

(e) in practice, people use mixes of Bayesian and non-Bayesian techniques, like pluggin-hyperparameters, which end up using the data more than once…

Merci et à bientôt!

]]>as an ecologist, I am very often confronted to the issue of model selection. Most of the time, this is done by some automated procedure that computes the AIC of all possible variable combination. I suspect then what we’re doing more variable selection than model selection. Anyway, the process is tedious and looks a bit like an overkill : most of the models considered have not been thought about (after all, R does this on its own without thinking) and some are just not plausible.

I read recently the Biometrika paper of Carvahlo, Scott and Polson about the horsehoe estimator for sparse signals. I would be very curious to know your thoughts about this procedure. I have used the horseshoe in a logistic regression about clutch size in a seabird. It was very useful in shrinking some spurious signals (evaluated from posterior predictive checks… I guess this is using the data twice…).

My more focus question is then : do you think the horseshoe prior can solve this “using the data twice” problem?

Yours sincerely

Matthieu ]]>