## variational Bayes for variable selection

**X**ichen Huang, Jin Wang and Feng Liang have recently arXived a paper where they rely on variational Bayes in conjunction with a spike-and-slab prior modelling. This actually stems from an earlier paper by Carbonetto and Stephens (2012), the difference being in the implementation of the method, which is less Gibbs-like for the current paper. The approach is not fully Bayesian in that, not only an approximate (variational) representation is used for the parameters of interest (regression coefficient and presence-absence indicators) but also the nuisance parameters are replaced with MAPs. The variational approximation on the regression parameters is an independent product of spike-and-slab distributions. The authors show the approximate approach is consistent in both frequentist and Bayesian terms (under identifiability assumptions). The method is undoubtedly faster than MCMC since it shares many features with EM but I still wonder at the Bayesian interpretability of the outcome, which writes out as a product of estimated spike-and-slab mixtures. First, the weights in the mixtures are estimated by EM, hence fixed. Second, the fact that the variational approximation is a product is confusing in that the posterior distribution on the regression coefficients is unlikely to produce posterior independence.

February 11, 2022 at 7:55 pm

The nuisance params here are the prior variances for the regression coefficients, right? I’m trying to do something similar, and have been running into philosophical quandaries related to another topic you blogged about: the lack of transformation invariance that MAPs display. These philosophical quandaries have recently metastasized into practical ones as the optimization problem is best solved in a different parameterization than the one which gives good MAP behavior…

March 30, 2016 at 7:16 pm

1) “The weights in the mixtures are estimated by EM, hence fixed”

I don’t get your point here. The weights are estimated by EM, fixed only for a given data set, but still data dependent.

2) “The Second, the fact that the variational approximation is a product is confusing in that the posterior distribution on the regression coefficients is unlikely to produce posterior independence.”

You are absolutely right! Most readers would think the product form implies that the posterior (from our algorithm) on each each component of beta is independent of each other; they are not since the parameters for each component are estimated based on all the data and therefore dependent.

I wouldn’t think this is a drawback of this algorithm; as a matter of fact, the true posterior should not be independent.

March 30, 2016 at 9:51 pm

Thanks, Feng!

1) I think I meant by this (week-old) remark that using EM for some parameters takes them away from the Bayesian paradigm since they are treated as “fixed” in the sense of loosing the variability afforded by Bayesian inference. (I may have meant something completely different though!!!)

2) I am not sure I fully get your argument as using a product in the variational representation seems to imply independence. The use of the whole data sounds irrelevant in that respect, but I may miss the point…

March 31, 2016 at 2:11 am

Hello, Chris! I might have misunderstood your 2) comment; you can ignore my previous response.

The message here is not about replacing MCMC by variational approximation in all of the posterior inference. It is possible that the approximation, although proved to have desired asymptotic properties, could miss some interesting dependence structure on a finite data set.

We would suggest to use the proposed algorithm as a Bayesian screening procedure: reduced the number of features from p to a moderate size based on the (marginal) inclusion probabilities, and then carry out a fully Bayesian analysis (e.g., using MCMC) on the reduced data set.