I also note an overlap in that I’ve seen people use the LASSO or ridge regression to first identify the optimum set (or sets) of covariates, then perform a second round of fitting to find their optimum coefficients as a way to avoid the under-estimation of coefficients produced by the shrinkage term. This seems similar to the separate derivation of the parameter priors post-weighting in your mixture of models scheme.

]]>But otherwise, I think my favourite bit of your mixture as model choice paper is that you get good predictions for free.

I wonder if you get better predictions than you do with more traditional BMA, given that the weights have a polynomial learning rate vs the exponential learning rate for BFs.

]]>