The message here is not about replacing MCMC by variational approximation in all of the posterior inference. It is possible that the approximation, although proved to have desired asymptotic properties, could miss some interesting dependence structure on a finite data set.

We would suggest to use the proposed algorithm as a Bayesian screening procedure: reduced the number of features from p to a moderate size based on the (marginal) inclusion probabilities, and then carry out a fully Bayesian analysis (e.g., using MCMC) on the reduced data set.

]]>1) I think I meant by this (week-old) remark that using EM for some parameters takes them away from the Bayesian paradigm since they are treated as “fixed” in the sense of loosing the variability afforded by Bayesian inference. (I may have meant something completely different though!!!)

2) I am not sure I fully get your argument as using a product in the variational representation seems to imply independence. The use of the whole data sounds irrelevant in that respect, but I may miss the point… ]]>

I don’t get your point here. The weights are estimated by EM, fixed only for a given data set, but still data dependent.

2) “The Second, the fact that the variational approximation is a product is confusing in that the posterior distribution on the regression coefficients is unlikely to produce posterior independence.”

You are absolutely right! Most readers would think the product form implies that the posterior (from our algorithm) on each each component of beta is independent of each other; they are not since the parameters for each component are estimated based on all the data and therefore dependent.

I wouldn’t think this is a drawback of this algorithm; as a matter of fact, the true posterior should not be independent.

]]>