Thanks for your answer David!

I guess my question would then be that, if local modes are the computable quantity, then now does this differ from penalised likelihood? (There may be a conceptual difference, but I’d still struggle not to see local MAP estimation as careful penalised likelihood)

I guess when I say Bayesian Sparsity I mean “full posterior analysis for very high dimensional models with a priori mass on sparse signals”

My fear with something like VB is that the resulting intervals (which will not be the credible intervals) are not interpretable, and so it’s only the location of the prior that is meaningful.

Or to say it differently, simply not ignoring uncertainty is not enough to make the inference more valid than one that only provides a mode. In some sense, I feel like not providing uncertainty is better than providing too narrow (or otherwise misleading) uncertainty.

(I’ve not seen high [bigger than, say >5e5 models where the computed interval is meaningful, but that doesn’t mean that such a thing doesn’t exist! And if it does I’d love to see it!)

I guess that I see bayes as a means to an end (meaningful inference) rather than an end itself. So I want to see some reason to go to the massive inconvenience of a Bayesian analysis before I do it.

So I still feel like I don’t understand the aim of they Bayesian Sparsity community. It might be that I’m making a false distinction between aiming for “Sparsity” and “posterior Sparsity”. And this is certainly not the corner of statistics in which I spend my time (which, incidentally, does not have totally different problems).

But even after chewing on your excellent response, I’m still having problems with Bayesian Sparsity. (But as always, there’s no reason to expect the problem isn’t just me)

]]>Mixing prior and utility is not a sin in my opinion..! Thanks for the detailed answer, worth of a post on its own!

]]>Given these theoretical & practical advantages, the real question is can we do it in practice? My personal experience is that, while a full model search is not feasible, finding local modes on the model space is no harder computationally than finding local modes for penalized likelihoods / continuous posteriors, and in practice often returns a better solution (in terms of model selection). That is, for some reason the idea that “classical” Bayesian model selection is unfeasible has spred, but that is not the case in my experience. I would say that this is an interesting open research question, but definitely not unfeasible. I’ve done quite a lot of applied work and can tell you that (if careful) you can get pretty good results in practice that you just won’t get with most shrinkage approaches.

Arrgh, I’m afraid I wrote a too long answer, apologies. Re Xian’s comment I completely agree that sensitivity to prior parameters is an issue for any model selection prior (local or non-local). However results are often only sensitive to big changes in the prior (e.g. taking things to infinity as in Jeffreys-Lindley-Bartlett’s paradox), moderate changes have little impact usually (Dawid has an interesting 1995 paper on that, “the trouble with Bayes factors”). In most real application the range of “reasonable” prior parameter values is relatively small, e.g. in practice I check that the range of parameter values where I’m putting prior mass is reasonable (effect sizes beta/sigma between 0.2 and 2, odds or hazard ratios between 1.5 and 3 or whatever). I realize this is a bit subjective but again small changes to these numbers don’t affect results very much even for moderately large n. A perhaps more “objective” strategy I personally like is to follow the “unit information prior” philosophy and set default prior parameters to match the entropy / variance of the UIP, again it often works surprisingly well. Another interesting option pursued by Val Johnson is calibration via frequentist type I error, this might make special sense when false positives cost us something (though I realize we’re sinning & intermingling the prior with the utility). Anyway, just my biased personal ramblings. :-)

]]>The large size of the model space is what suggests a Bayesian approach. The LASSO and it’s variants will return an answer that represents some local mode, but almost surely not the global mode. So the returned model is probably not robust and the standard errors well under-estimate the true uncertainty of the joint distribution over the coefficients. Of course it’s hard, and maybe not yet well-resolved, how to more reliability (under a fixed sample size, not asymptotically) assign a high posterior probability to the set of models close to the truth. Of course, no model can sample or assign a probability to every point in the model space. The hope is to have a method that finds the high posterior region. That said, at least the Bayesian approach is actually contemplating the true uncertainty.

]]>I feel like I’m probably being uncharitable here, but I’m not seeing a practical advantage to this work (which, let’s face it, has been going for a while). But this is one of those situations where I’d rather be wrong…

]]>