**X**ichen Huang, Jin Wang and Feng Liang have recently arXived a paper where they rely on variational Bayes in conjunction with a spike-and-slab prior modelling. This actually stems from an earlier paper by Carbonetto and Stephens (2012), the difference being in the implementation of the method, which is less Gibbs-like for the current paper. The approach is not fully Bayesian in that, not only an approximate (variational) representation is used for the parameters of interest (regression coefficient and presence-absence indicators) but also the nuisance parameters are replaced with MAPs. The variational approximation on the regression parameters is an independent product of spike-and-slab distributions. The authors show the approximate approach is consistent in both frequentist and Bayesian terms (under identifiability assumptions). The method is undoubtedly faster than MCMC since it shares many features with EM but I still wonder at the Bayesian interpretability of the outcome, which writes out as a product of estimated spike-and-slab mixtures. First, the weights in the mixtures are estimated by EM, hence fixed. Second, the fact that the variational approximation is a product is confusing in that the posterior distribution on the regression coefficients is unlikely to produce posterior independence.

## Archive for MCMC

## variational Bayes for variable selection

Posted in Books, Statistics, University life with tags Bayesian lasso, consistency, EM algorithm, MAP estimators, MCMC, spike-and-slab prior, variable selection, variational Bayes methods on March 30, 2016 by xi'an## at CIRM [#3]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags ABC, ABC-SMC, Bayesian statistics, CIRM, component of a mixture, cross validated, expectation-propagation, high dimensions, identifiability, Luminy, Marseille, MCMC, Mont Puget, Monte Carlo Statistical Methods, particle filter, particle Gibbs sampler, summer school on March 4, 2016 by xi'an**S**imon Barthelmé gave his mini-course on EP, with loads of details on the implementation of the method. Focussing on the EP-ABC and MCMC-EP versions today. Leaving open the difficulty of assessing to which limit EP is converging. But mentioning the potential for asynchronous EP (on which I would like to hear more). Ironically using several times a logistic regression example, if not on the Pima Indians benchmark! He also talked about approximate EP solutions that relate to consensus MCMC. With a connection to Mark Beaumont’s talk at NIPS [at the time as mine!] on the comparison with ABC. While we saw several talks on EP during this week, I am still agnostic about the potential of the approach. It certainly produces a fast proxy to the true posterior and hence can be exploited *ad nauseam* in inference methods based on pseudo-models like indirect inference. In conjunction with other quick and dirty approximations when available. As in ABC, it would be most useful to know how far from the (ideal) posterior distribution does the approximation stands. Machine learning approaches presumably allow for an evaluation of the predictive performances, but less so for the modelling accuracy, even with new sampling steps. [But I know nothing, I know!]

Dennis Prangle presented some on-going research on high dimension [data] ABC. Raising the question of what is the true meaning of dimension in ABC algorithms. Or of sample size. Because the inference relies on the event d(s(y),s(y’))≤ξ or on the likelihood l(θ|x). Both one-dimensional. Mentioning Iain Murray’s talk at NIPS [that I also missed]. Re-expressing as well the perspective that ABC can be seen as a missing or estimated normalising constant problem as in Bornn et al. (2015) I discussed earlier. The central idea is to use SMC to simulate a particle cloud evolving as the target tolerance ξ decreases. Which supposes a latent variable structure lurking in the background.

Judith Rousseau gave her talk on non-parametric mixtures and the possibility to learn parametrically about the component weights. Starting with a rather “magic” result by Allman et al. (2009) that three repeated observations per individual, all terms in a mixture are identifiable. Maybe related to that simpler fact that mixtures of Bernoullis are not identifiable while mixtures of Binomial are identifiable, even when n=2. As “shown” in this plot made for X validated. Actually truly related because Allman et al. (2009) prove identifiability through a finite dimensional model. (I am surprised I missed this most interesting paper!) With the side condition that a mixture of p components made of r Bernoulli products is identifiable when p ≥ 2[log² r] +1, when log² is base 2-logarithm. And [x] the upper rounding. I also find most relevant this distinction between the weights and the remainder of the mixture as weights behave quite differently, hardly parameters in a sense.

## the last digit of e

Posted in Kids, Mountains, pictures, Statistics, Travel, University life with tags Adrian Smith, Gibbs sampling, Gnedenko, Guy Medal in Gold, MCMC, Québec, Royal Statistical Society, Sherbrooke on March 3, 2016 by xi'an**É**ric Marchand from Sherbrooke, Québec [historical birthplace of MCMC, since Adrian Smith gave his first talk on his Gibbs sampler there, in June 1989], noticed my recent posts about the approximation of e by Monte Carlo methods and sent me a paper he wrote in The Mathematical Gazette of November 1995 [full MCMC era!] about original proofs on the expectation of some stopping rules being e, like the length of increasing runs. And Gnedenko’s uniform summation until exceeding one. Amazing that this simple problem generated so much investigation!!!

## Bayesian week in a statistics month at CIRM

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags ABC, Cassis, CIRM, climbing, Luminy, Marseiile, MCMC, Mont Puget, Monte Carlo Statistical Methods, Morgiou, Sugiton, trail running on February 28, 2016 by xi'anAs posted earlier, this week is a Bayesian week at CIRM, the French mathematical society centre near Marseilles. Where we meet with about 80 researchers and students interested in Bayesian statistics, from all possible sides. (And possibly in climbing in the Calanques and trail running, if not swimming at this time of year…) With Jean-Michel we will be teaching a short course on Bayesian computational methods, namely ABC and MCMC, over the first two days… Here are my slides for the MCMC side:

As should be obvious from the first slides, this is a very introductory course that should only appeal to students with no previous exposure. The remainder of the week will see advanced talks on the state-of-the-art Bayesian computational methods, including some on noisy MCMC and on the mysterious expectation-propagation technique.

## next BayesComp conference planned for Jan 2018, any volunteer?

Posted in Kids, Mountains, Statistics, Travel, University life with tags Adapski, BayesComp, Bayesian computing, Bayesian Computing Section, Bayesian conference, BC2018, Bormio, Chamonix, ISBA, Italy, Lenzerheide, MCMC, MCMSki, Monte Carlo Statistical Methods, Park City, satellite workshop, Switzerland, Utah on February 25, 2016 by xi'an*[A call from the BayesComp section of ISBA for the next Bayesian computation meeting! As suggested in an earlier post, the label MCMski is discontinued to allow for any location amenable to organise a 200 plus meeting in good and hopefully reasonably priced conditions.]*

**The Bayesian Computation Section of ISBA is soliciting proposals to host its flagship meeting: BayesComp 2018**

The expectation is that the meeting will be held in January 2018, but the committee will consider proposals for other times through January 2019. This meeting is a continuation of the popular MCMSki on recent advances in the theory and application of Bayesian computational methods such as MCMC. The tradition was to hold MCMski meetings in ski resorts, but, as the name change suggests, we encourage applications from any venue that could support BC2018.

A three-day meeting is planned, perhaps with an additional day or two of satellite meetings and/or short courses. One page proposals should address feasibility of hosting the meeting including

1. Proposed dates.

2. Transportation for international participants (both the proximity of international airports and transportation to/from the venue).

3. The conference facilities.

4. The availability and cost of hotels, including low cost options.

5. The proposed local organizing committee and their collective experience organizing international meetings.

6. Expected or promised contributions from the host organization, host country, or industrial partners towards the cost of running the meetings.

*Proposals should be submitted to Nicolas Chopin (Program Chair) no later than May 31, 2016. The Board of Bayesian Computing Section will evaluate the proposals, choose a venue, and appoint the Program Committee for BayesComp 2018.*

## high dimension Metropolis-Hastings algorithms

Posted in Books, Kids, Mountains, pictures, R, Statistics with tags acceptance probability, curse of dimensionality, high dimensions, MCMC, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, unmlaut on January 26, 2016 by xi'an**W**hen discussing high dimension models with Ingmar ~~Schüster~~ Schuster [blame my fascination for accented characters!] the other day, we came across the following paradox with Metropolis-Hastings algorithms. If attempting to simulate from a multivariate standard normal distribution in a large dimension, when starting from the mode of the target, i.e., its mean γ, leaving the mode γis extremely unlikely, given the huge drop between the value of the density at the mode γ and at likely realisations (corresponding to the blue sequence). Even when relying on the very scale that makes the proposal identical to the target! Resorting to a tiny scale like Σ/p manages to escape the unhealthy neighbourhood of the highly unlikely mode (as shown with the brown sequence).

Here is the corresponding R code:

p=100 T=1e3 mh=mu #mode as starting value vale=rep(0,T) for (t in 1:T){ prop=mvrnorm(1,mh,sigma/p) if (log(runif(1))<logdmvnorm(prop,mu,sigma)- logdmvnorm(mh,mu,sigma)) mh=prop vale[t]=logdmvnorm(mh,mu,sigma)}