## patterns of scalable Bayesian inference

**E**laine Angelino, Matthew Johnson and Ryan Adams just arXived a massive survey of 118 pages on scalable Bayesian inference, which could have been entitled *Bayes for Big Data*, as this monograph covers state-of-the-art computational approaches to large and complex data structures. I did not read each and every line of it, but I have already recommended it to my PhD students. Some of its material unsurprisingly draws from the recent survey by Rémi Bardenet et al. (2015) I discussed a while ago. It also relates rather frequently to the somewhat parallel ICML paper of Korattikara et al. (2014). And to the firefly Monte Carlo procedure also discussed previously here.

Chapter 2 provides some standard background on computational techniques, Chapter 3 covers MCMC with data subsets, Chapter 4 gives some entries on MCMC with parallel and distributed architectures, Chapter 5 focus on variational solutions, and Chapter 6 is about open questions and challenges.

“Insisting on zero asymptotic bias from Monte Carlo estimates of expectations may leave us swamped in errors from high variance or transient bias.”

One central theme of the paper is the need for approximate solutions, MCMC being perceived as the exact solution. (Somewhat wrongly in the sense that the product of an MCMC is at best an empirical version of the true posterior, hence endowed with a residual and incompressible variation for a given computing budget.) While Chapter 3 stresses the issue of assessing the distance to the true posterior, it does not dwell at all on computing times and budget, which is arguably a much harder problem. Chapter 4 seems to be more aware of this issue since arguing that “a way to use parallel computing resources is to run multiple sequential MCMC algorithms at once [but that this] does not reduce the transient bias in MCMC estimates of posterior expectations” (p.54). The alternatives are to use either prefetching (which was the central theme of Elaine Angelino’s thesis), asynchronous Gibbs with the new to me (?) Hogwild Gibbs algorithms (connected in Terenin et al.’s recent paper, not quoted in the paper), some versions of consensus Monte Carlo covered in earlier posts, the missing links being in my humble opinion an assessment of the worth of those solutions (in the spirit of “here’s the solution, what was the problem again?”) and once again the computing time issue. Chapter 5 briefly discusses some recent developments in variational mean field approximations, which is farther from my interests and (limited) competence, but which appears as a particular class of approximate models and thus could (and should?) relate to likelihood-free methods. Chapter 6 about the current challenges of the field is presumably the most interesting in this monograph in that it produces open questions and suggests directions for future research. For instance, opposing the long term MCMC error with the short term transient part. Or the issue of comparing different implementations in a practical and timely perspective.

*Related*

This entry was posted on February 24, 2016 at 12:16 am and is filed under Books, Statistics, University life with tags ABC, Approximate Bayesian computation, approximate target, asynchronous algorithms, embarassingly parallel, MCMC, Monte Carlo Statistical Methods, noisy MCMC, parallelisation, prefetching, scalability, transience, variational Bayes methods. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

January 4, 2017 at 4:18 pm

This review has now appeared in

Foundations and Trends in Machine Learning

Vol. 9, No. 2-3 (2016) 119–247

(c.) 2016 E. Angelino, M. J. Johnson, and R. P. Adams

DOI: 10.1561/2200000052

January 4, 2017 at 5:04 pm

So, the hard copy is sold at $95 and the E-book at $260 (despite being available on arXiv). I’m puzzled

January 4, 2017 at 6:40 pm

Yes, the reasons for pricing papers are beyond rationality!

February 29, 2016 at 1:14 am

It was a bit of a surprise that our Gibbs paper didn’t get a look in – the second author has definitely read it.

February 25, 2016 at 6:59 am

The basic idea of variational inference is to fit a parameterized distribution, say q(x) (usually in the exponential family), to the posterior distribution p(x|y) by minimizing e.g. Kullback-Leibler div. KL(q || p) as a function of the parameters of q(x). You still have to be able to evaluate the likelihood.

It is indeed a good paper for getting a fairly up to date (anno 2014-ish) view on what is happening with scaling Bayesian inference to the “big data” regime.

February 27, 2016 at 10:03 pm

Thanks, I am aware of what variational Bayes stands for but cannot analyse how much of the current developments in this area is covered by the survey.