**X**ichen Huang, Jin Wang and Feng Liang have recently arXived a paper where they rely on variational Bayes in conjunction with a spike-and-slab prior modelling. This actually stems from an earlier paper by Carbonetto and Stephens (2012), the difference being in the implementation of the method, which is less Gibbs-like for the current paper. The approach is not fully Bayesian in that, not only an approximate (variational) representation is used for the parameters of interest (regression coefficient and presence-absence indicators) but also the nuisance parameters are replaced with MAPs. The variational approximation on the regression parameters is an independent product of spike-and-slab distributions. The authors show the approximate approach is consistent in both frequentist and Bayesian terms (under identifiability assumptions). The method is undoubtedly faster than MCMC since it shares many features with EM but I still wonder at the Bayesian interpretability of the outcome, which writes out as a product of estimated spike-and-slab mixtures. First, the weights in the mixtures are estimated by EM, hence fixed. Second, the fact that the variational approximation is a product is confusing in that the posterior distribution on the regression coefficients is unlikely to produce posterior independence.

## Archive for Bayesian lasso

## variational Bayes for variable selection

Posted in Books, Statistics, University life with tags Bayesian lasso, consistency, EM algorithm, MAP estimators, MCMC, spike-and-slab prior, variable selection, variational Bayes methods on March 30, 2016 by xi'an## O-Bayes15 [day #1]

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags Bayesian lasso, Bernstein-von Mises theorem, objective Bayes, robustness, Susie Bayarri, Valencia conferences, Valencia meeting on June 3, 2015 by xi'an**S**o here we are back together to talk about objective Bayes methods, and in the City of Valencià as well.! A move back to a city where the 1998 O’Bayes took place. In contrast with my introductory tutorial, the morning tutorials by Luis Pericchi and Judith Rousseau were investigating fairly technical and advanced, Judith looking at the tools used in the frequentist (Bernstein-von Mises) analysis of priors, with forays in empirical Bayes, giving insights into a wide range of recent papers in the field. And Luis covering works on Bayesian robustness in the sense of resisting to over-influential observations. Following works of him and of Tony O’Hagan and coauthors. Which means characterising tails of prior versus sampling distribution to allow for the posterior reverting to the prior in case of over-influential datapoints. Funny enough, after a great opening by Carmen and Ed remembering Susie, Chris Holmes also covered Bayesian robust analysis. More in the sense of incompletely or mis- specified models. (On the side, rekindling one comment by Susie and the need to embed robust Bayesian analysis within decision theory.) Which was also much Chris’ point, in line with the recent Watson and Holmes’ paper. Dan Simpson in his usual kick-the-anthill-real-hard-and-set-fire-to-it discussion pointed out the possible discrepancy between objective and robust Bayesian analysis. (With lines like “modern statistics has proven disruptive to objective Bayes”.) Which is not that obvious because the robust approach simply reincorporates the decision theory within the objective framework. (Dan also concluded with the comic strip below, whose message can be interpreted in many ways…! Or not.)

The second talk of the afternoon was given by Veronika Ročková on a novel type of spike-and-slab prior to handle sparse regression, bringing an alternative to the standard Lasso. The prior is a mixture of two Laplace priors whose scales are constrained in connection with the actual number of non-zero coefficients. I had not heard of this approach before (although Veronika and Ed have an earlier paper on a spike-and-slab prior to handle multicolinearity that Veronika presented in Boston last year) and I was quite impressed by the combination of minimax properties and practical determination of the scales. As well as by the performances of this spike-and-slab Lasso. I am looking forward the incoming paper!

The day ended most nicely in the botanical gardens of the University of Valencià, with an outdoor reception surrounded by palm trees and parakeet cries…

## this issue of Series B

Posted in Books, Statistics, Travel, University life with tags bag of little bootstraps, Bayesian bridge, Bayesian lasso, JRSSB, marginal likelihood, Markov chain Monte Carlo, normalising constant, Series B, simulation, untractable normalizing constant, Wasserman's paradox on September 5, 2014 by xi'an**T**he September issue of [JRSS] Series B I received a few days ago is of particular interest to me. (And not as an ex-co-editor since I was never involved in any of those papers!) To wit: a paper by Hani Doss and Aixin Tan on evaluating normalising constants based on MCMC output, a preliminary version I had seen at a previous JSM meeting, a paper by Nick Polson, James Scott and Jesse Windle on the Bayesian bridge, connected with Nick’s talk in Boston earlier this month, yet another paper by Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar and Michael Jordan on the bag of little bootstraps, which presentation I heard Michael deliver a few times when he was in Paris. (Obviously, this does not imply any negative judgement on the other papers of this issue!)

For instance, Doss and Tan consider the multiple mixture estimator [my wording, the authors do not give the method a name, referring to Vardi (1985) but missing the connection with Owen and Zhou (2000)] of k ratios of normalising constants, namely

where the z’s are the normalising constants and with possible different numbers of iterations of each Markov chain. An interesting starting point (that Hans Künsch had mentioned to me a while ago but that I had since then forgotten) is that the problem was reformulated by Charlie Geyer (1994) as a quasi-likelihood estimation where the ratios of all z’s relative to one reference density are the unknowns. This is doubling interesting, actually, because it restates the constant estimation problem into a statistical light and thus somewhat relates to the infamous “paradox” raised by Larry Wasserman a while ago. The novelty in the paper is (a) to derive an optimal estimator of the ratios of normalising constants in the Markov case, essentially accounting for possibly different lengths of the Markov chains, and (b) to estimate the variance matrix of the ratio estimate by regeneration arguments. A favourite tool of mine, at least theoretically as practically useful minorising conditions are hard to come by, if at all available.

## Monte Carlo Statistical Methods third edition

Posted in Books, R, Statistics, University life with tags Bayesian lasso, hierarchical Bayesian modelling, Introducing Monte Carlo Methods with R, Markov chains, MCMC, mixture estimation, Monte Carlo Statistical Methods, nested sampling, perfect sampling, Peskun ordering, R, Rao-Blackwellisation, regeneration, slice sampling on September 23, 2010 by xi'an**L**ast week, George Casella and I worked around the clock on starting the third edition of * Monte Carlo Statistical Methods* by detailing the changes to make and designing the new table of contents. The new edition will not see a revolution in the presentation of the material but rather a more mature perspective on what matters most in statistical simulation:

## The day I invented Bayesian Lasso…

Posted in Books, Statistics with tags Bayesian decision theory, Bayesian lasso, joke, The Bayesian Choice on August 16, 2010 by xi'an**G**eorge Casella remarked to me last month in Padova that, once he and Trevor Park published ** The Bayesian Lasso** in JASA, they received many claims for prior discovery of “Bayesian Lasso”! So, as a joke, let me add my claim as well! Indeed, in the first (1994) edition of

**, I included an example in Chapter 4 (Example 4.2) about the fact that using a double exponential prior along a Cauchy likelihood was producing a zero MAP (maximum a posteriori) estimate. Isn’t that the essence of the Bayesian lasso?! Of course, as you can still check in the current edition, the example was intended as a counter-example to the use of MAP estimates, not as an argument about the parsimony induced by double exponential priors. (Exercice 4.6 in both editions builds upon this example to notice that, with a small enough scale parameter, the absolute shrinkage to zero vanishes.) I thus lost the opportunity of “inventing” the Bayesian Lasso! To my shame, I must add that in the even earlier 1992 French edition of the book, I made a mistake in the derivation of the MAP and hence completely missed the point!!!**

*The Bayesian Choice*