Archive for MCMC

variational Bayes for variable selection

Posted in Books, Statistics, University life with tags , , , , , , , on March 30, 2016 by xi'an

Lake Agnes, Canadian Rockies, July 2007Xichen Huang, Jin Wang and Feng Liang have recently arXived a paper where they rely on variational Bayes in conjunction with a spike-and-slab prior modelling. This actually stems from an earlier paper by Carbonetto and Stephens (2012), the difference being in the implementation of the method, which is less Gibbs-like for the current paper. The approach is not fully Bayesian in that, not only an approximate (variational) representation is used for the parameters of interest (regression coefficient and presence-absence indicators) but also the nuisance parameters are replaced with MAPs. The variational approximation on the regression parameters is an independent product of spike-and-slab distributions. The authors show the approximate approach is consistent in both frequentist and Bayesian terms (under identifiability assumptions). The method is undoubtedly faster than MCMC since it shares many features with EM but I still wonder at the Bayesian interpretability of the outcome, which writes out as a product of estimated spike-and-slab mixtures. First, the weights in the mixtures are estimated by EM, hence fixed. Second, the fact that the variational approximation is a product is confusing in that the posterior distribution on the regression coefficients is unlikely to produce posterior independence.

at CIRM [#3]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on March 4, 2016 by xi'an

Simon Barthelmé gave his mini-course on EP, with loads of details on the implementation of the method. Focussing on the EP-ABC and MCMC-EP versions today. Leaving open the difficulty of assessing to which limit EP is converging. But mentioning the potential for asynchronous EP (on which I would like to hear more). Ironically using several times a logistic regression example, if not on the Pima Indians benchmark! He also talked about approximate EP solutions that relate to consensus MCMC. With a connection to Mark Beaumont’s talk at NIPS [at the time as mine!] on the comparison with ABC. While we saw several talks on EP during this week, I am still agnostic about the potential of the approach. It certainly produces a fast proxy to the true posterior and hence can be exploited ad nauseam in inference methods based on pseudo-models like indirect inference. In conjunction with other quick and dirty approximations when available. As in ABC, it would be most useful to know how far from the (ideal) posterior distribution does the approximation stands. Machine learning approaches presumably allow for an evaluation of the predictive performances, but less so for the modelling accuracy, even with new sampling steps. [But I know nothing, I know!]

Dennis Prangle presented some on-going research on high dimension [data] ABC. Raising the question of what is the true meaning of dimension in ABC algorithms. Or of sample size. Because the inference relies on the event d(s(y),s(y’))≤ξ or on the likelihood l(θ|x). Both one-dimensional. Mentioning Iain Murray’s talk at NIPS [that I also missed]. Re-expressing as well the perspective that ABC can be seen as a missing or estimated normalising constant problem as in Bornn et al. (2015) I discussed earlier. The central idea is to use SMC to simulate a particle cloud evolving as the target tolerance ξ decreases. Which supposes a latent variable structure lurking in the background.

Judith Rousseau gave her talk on non-parametric mixtures and the possibility to learn parametrically about the component weights. Starting with a rather “magic” result by Allman et al. (2009) that three repeated observations per individual, all terms in a mixture are identifiable. Maybe related to that simpler fact that mixtures of Bernoullis are not identifiable while mixtures of Binomial are identifiable, even when n=2. As “shown” in this plot made for X validated. Actually truly related because Allman et al. (2009) prove identifiability through a finite dimensional model. (I am surprised I missed this most interesting paper!) With the side condition that a mixture of p components made of r Bernoulli products is identifiable when p ≥ 2[log² r] +1, when log² is base 2-logarithm. And [x] the upper rounding. I also find most relevant this distinction between the weights and the remainder of the mixture as weights behave quite differently, hardly parameters in a sense.

the last digit of e

Posted in Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on March 3, 2016 by xi'an

Éric Marchand from Sherbrooke, Québec [historical birthplace of MCMC, since Adrian Smith gave his first talk on his Gibbs sampler there, in June 1989], noticed my recent posts about the approximation of e by Monte Carlo methods and sent me a paper he wrote in The Mathematical Gazette of November 1995 [full MCMC era!] about original proofs on the expectation of some stopping rules being e, like the length of increasing runs. And Gnedenko’s uniform summation until exceeding one. Amazing that this simple problem generated so much investigation!!!

Bayesian week in a statistics month at CIRM

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , on February 28, 2016 by xi'an

Calanque de Morgiou, Marseille, July 7, 2010As posted earlier, this week is a Bayesian week at CIRM, the French mathematical society centre near Marseilles. Where we meet with about 80 researchers and students interested in Bayesian statistics, from all possible sides. (And possibly in climbing in the Calanques and trail running, if not swimming at this time of year…) With Jean-Michel we will be teaching a short course on Bayesian computational methods, namely ABC and MCMC, over the first two days… Here are my slides for the MCMC side:

As should be obvious from the first slides, this is a very introductory course that should only appeal to students with no previous exposure. The remainder of the week will see advanced talks on the state-of-the-art Bayesian computational methods, including some on noisy MCMC and on the mysterious expectation-propagation technique.

next BayesComp conference planned for Jan 2018, any volunteer?

Posted in Kids, Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on February 25, 2016 by xi'an

MCMSki III poster, 2010 (C.) IMS[A call from the BayesComp section of ISBA for the next Bayesian computation meeting! As suggested in an earlier post, the label MCMski is discontinued to allow for any location amenable to organise a 200 plus meeting in good and hopefully reasonably priced conditions.]

The Bayesian Computation Section of ISBA is soliciting proposals to host its flagship meeting: BayesComp 2018

The expectation is that the meeting will be held in January 2018, but the committee will consider proposals for other times through January 2019. This meeting is a continuation of the popular MCMSki on recent advances in the theory and application of Bayesian computational methods such as MCMC. The tradition was to hold MCMski meetings in ski resorts, but, as the name change suggests, we encourage applications from any venue that could support BC2018.

A three-day meeting is planned, perhaps with an additional day or two of satellite meetings and/or short courses. One page proposals should address feasibility of hosting the meeting including

1. Proposed dates.
2. Transportation for international participants (both the proximity of international airports and transportation to/from the venue).
3. The conference facilities.
4. The availability and cost of hotels, including low cost options.
5. The proposed local organizing committee and their collective experience organizing international meetings.
6. Expected or promised contributions from the host organization, host country, or industrial partners towards the cost of running the meetings.

Proposals should be submitted to Nicolas Chopin (Program Chair) no later than May 31, 2016. The Board of Bayesian Computing Section will evaluate the proposals, choose a venue, and appoint the Program Committee for BayesComp 2018.

patterns of scalable Bayesian inference

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on February 24, 2016 by xi'an

Elaine Angelino, Matthew Johnson and Ryan Adams just arXived a massive survey of 118 pages on scalable Bayesian inference, which could have been entitled Bayes for Big Data, as this monograph covers state-of-the-art computational approaches to large and complex data structures. I did not read each and every line of it, but I have already recommended it to my PhD students. Some of its material unsurprisingly draws from the recent survey by Rémi Bardenet et al. (2015) I discussed a while ago. It also relates rather frequently to the somewhat parallel ICML paper of Korattikara et al. (2014). And to the firefly Monte Carlo procedure also discussed previously here.

Chapter 2 provides some standard background on computational techniques, Chapter 3 covers MCMC with data subsets, Chapter 4 gives some entries on MCMC with parallel and distributed architectures, Chapter 5 focus on variational solutions, and Chapter 6 is about open questions and challenges.

“Insisting on zero asymptotic bias from Monte Carlo estimates of expectations may leave us swamped in errors from high variance or transient bias.”

One central theme of the paper is the need for approximate solutions, MCMC being perceived as the exact solution. (Somewhat wrongly in the sense that the product of an MCMC is at best an empirical version of the true posterior, hence endowed with a residual and incompressible variation for a given computing budget.) While Chapter 3 stresses the issue of assessing the distance to the true posterior, it does not dwell at all on computing times and budget, which is arguably a much harder problem. Chapter 4 seems to be more aware of this issue since arguing that “a way to use parallel computing resources is to run multiple sequential MCMC algorithms at once [but that this] does not reduce the transient bias in MCMC estimates of posterior expectations” (p.54). The alternatives are to use either prefetching (which was the central theme of Elaine Angelino’s thesis), asynchronous Gibbs with the new to me (?) Hogwild Gibbs algorithms (connected in Terenin et al.’s recent paper, not quoted in the paper), some versions of consensus Monte Carlo covered in earlier posts, the missing links being in my humble opinion an assessment of the worth of those solutions (in the spirit of “here’s the solution, what was the problem again?”) and once again the computing time issue. Chapter 5 briefly discusses some recent developments in variational mean field approximations, which is farther from my interests and (limited) competence, but which appears as a particular class of approximate models and thus could (and should?) relate to likelihood-free methods. Chapter 6 about the current challenges of the field is presumably the most interesting in this monograph in that it produces open questions and suggests directions for future research. For instance, opposing the long term MCMC error with the short term transient part. Or the issue of comparing different implementations in a practical and timely perspective.

high dimension Metropolis-Hastings algorithms

Posted in Books, Kids, Mountains, pictures, R, Statistics with tags , , , , , , on January 26, 2016 by xi'an

When discussing high dimension models with Ingmar Schüster Schuster [blame my fascination for accented characters!] the other day, we came across the following paradox with Metropolis-Hastings algorithms. If attempting to simulate from a multivariate standard normal distribution in a large dimension, when starting from the mode of the target, i.e., its mean γ, leaving the mode γis extremely unlikely, given the huge drop between the value of the density at the mode γ and at likely realisations (corresponding to the blue sequence). Even when relying on the very scale that makes the proposal identical to the target! Resorting to a tiny scale like Σ/p manages to escape the unhealthy neighbourhood of the highly unlikely mode (as shown with the brown sequence).

Here is the corresponding R code:

p=100
T=1e3
mh=mu #mode as starting value
vale=rep(0,T)
for (t in 1:T){
prop=mvrnorm(1,mh,sigma/p)
if (log(runif(1))<logdmvnorm(prop,mu,sigma)-
   logdmvnorm(mh,mu,sigma)) mh=prop
vale[t]=logdmvnorm(mh,mu,sigma)}
Follow

Get every new post delivered to your Inbox.

Join 1,049 other followers