Archive for scalability

at the Isaac Newton Institute

Posted in Statistics, Travel, University life with tags , , , , , on July 6, 2017 by xi'an

MCM 2017

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on February 10, 2017 by xi'an

Je reviendrai à Montréal, as the song by Robert Charlebois goes, for the MCM 2017 meeting there, on July 3-7. I was invited to give a plenary talk by the organisers of the conference . Along with

Steffen Dereich, WWU Münster, Germany
Paul Dupuis, Brown University, Providence, USA
Mark Girolami, Imperial College London, UK
Emmanuel Gobet, École Polytechnique, Palaiseau, France
Aicke Hinrichs, Johannes Kepler University, Linz, Austria
Alexander Keller, NVIDIA Research, Germany
Gunther Leobacher, Johannes Kepler University, Linz, Austria
Art B. Owen, Stanford University, USA

Note that, while special sessions are already selected, including oneon Stochastic Gradient methods for Monte Carlo and Variational Inference, organised by Victor Elvira and Ingmar Schuster (my only contribution to this session being the suggestion they organise it!), proposals for contributed talks will be selected based on one-page abstracts, to be submitted by March 1.

simple, scalable and accurate posterior interval estimation

Posted in Statistics with tags , , , , , , , on July 6, 2016 by xi'an

“There is a lack of simple and scalable algorithms for uncertainty quantification.”

A paper by Cheng Li , Sanvesh Srivastava, and David Dunson that I had missed and which was pointed out on Andrew’s blog two days ago. As recalled in the very first sentence of the paper, above, the existing scalable MCMC algorithms somewhat fail to account for confidence (credible) intervals. In the sense that handling parallel samples does not naturally produce credible intervals.Since the approach is limited to one-dimensional quantity of interest, ζ=h(θ), the authors of the paper consider the MCMC approximations of the cdf of the said quantity ζ based on the manageable subsets like as many different approximations of the same genuine posterior distribution of that quantity ζ. (Corrected by a power of the likelihood but dependent on the particular subset used for the estimation.) The estimate proposed in the paper is a Wasserstein barycentre of the available estimations, barycentre that is defined as minimising the sum of the Wasserstein distances to all estimates. (Why should this measure be relevant: the different estimates may be of different quality). Interestingly (at least at a computational level), the solution is such that the quantile function of the Wasserstein barycentre is the average of the estimated quantiles functions. (And is there an alternative loss returning the median cdf?) A confidence interval based on the quantile function can then be directly derived. The paper shows that this Wasserstein barycentre converges to the true (marginal) posterior as the sample size m of each sample grows to infinity (and faster than 1/√m), with the strange side-result that the convergence is in 1/√n when the MLE of the global parameter θ is unbiased. Strange to me because unbiasedness is highly dependent on parametrisation while the performances of this estimator should not be, i.e., should be invariant under reparameterisation. Maybe this is due to ζ being a linear transform of θ in the convergence theorem… In any case, I find this question of merging cdf’s from poorly defined approximations to an unknown cdf of the highest interest and look forward any further proposal to this effect!

communication-efficient distributed statistical learning

Posted in Books, Statistics, University life with tags , , , , , , , , on June 10, 2016 by xi'an

mikecemMichael Jordan, Jason Lee, and Yun Yang just arXived a paper with their proposal on handling large datasets through distributed computing, thus contributing to the currently very active research topic of approximate solutions in large Bayesian models. The core of the proposal is summarised by the screenshot above, where the approximate likelihood replaces the exact likelihood with a first order Taylor expansion. The first term is the likelihood computed for a given subsample (or a given thread) at a ratio of one to N and the difference of the gradients is only computed once at a good enough guess. While the paper also considers M-estimators and non-Bayesian settings, the Bayesian part thus consists in running a regular MCMC when the log-target is approximated by the above. I first thought this proposal amounted to a Gaussian approximation à la Simon Wood or to an INLA approach but this is not the case: the first term of the approximate likelihood is exact and hence can be of any form, while the scalar product is linear in θ, providing a sort of first order approximation, albeit frozen at the chosen starting value.

mikecem2Assuming that each block of the dataset is stored on a separate machine, I think the approach could further be implemented in parallel, running N MCMC chains and comparing the output. With a post-simulation summary stemming from the N empirical distributions thus produced. I also wonder how the method would perform outside the fairly smooth logistic regression case, where the single sample captures well-enough the target. The picture above shows a minor gain in a misclassification rate that is already essentially zero.

merging MCMC subposteriors

Posted in Books, Statistics, University life with tags , , , , , , , on June 8, 2016 by xi'an

Christopher Nemeth and Chris Sherlock arXived a paper yesterday about an approach to distributed MCMC sampling via Gaussian processes. As in several other papers commented on the ‘Og, the issue is to merge MCMC samples from sub-posteriors into a sample or any sort of approximation of the complete (product) posterior. I am quite sympathetic to the approach adopted in this paper, namely to use a log-Gaussian process representation of each sub-posterior and then to replace each sub-posterior with its log-Gaussian process posterior expectation in an MCMC or importance scheme. And to assess its variability through the posterior variance of the sum of log-Gaussian processes. As pointed out by the authors the closed form representation of the posterior mean of the log-posterior is invaluable as it allows for an HMC implementation. And importance solutions as well. The probabilistic numerics behind this perspective are also highly relevant.

A few arguable (?) points:

  1. The method often relies on importance sampling and hence on the choice of an importance function that is most likely influential but delicate to calibrate in complex settings as I presume the Gaussian estimates are not useful in this regard;
  2. Using Monte Carlo to approximate the value of the approximate density at a given parameter value (by simulating from the posterior distribution) is natural but is it that efficient?
  3. It could be that, by treating all sub-posterior samples as noisy versions of the same (true) posterior, a more accurate approximation of this posterior could be constructed;
  4. The method relies on the exponentiation of a posterior expectation or simulation. As of yesterday, I am somehow wary of log-normal expectations!
  5. If the purpose of the exercise is to approximate univariate integrals, it would seem more profitable to use the Gaussian processes at the univariate level;
  6. The way the normalising missing constants and the duplicate simulations are processed (or not) could deserve further exploration;
  7. Computing costs are in fine unclear when compared with the other methods in the toolbox.

patterns of scalable Bayesian inference

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on February 24, 2016 by xi'an

Elaine Angelino, Matthew Johnson and Ryan Adams just arXived a massive survey of 118 pages on scalable Bayesian inference, which could have been entitled Bayes for Big Data, as this monograph covers state-of-the-art computational approaches to large and complex data structures. I did not read each and every line of it, but I have already recommended it to my PhD students. Some of its material unsurprisingly draws from the recent survey by Rémi Bardenet et al. (2015) I discussed a while ago. It also relates rather frequently to the somewhat parallel ICML paper of Korattikara et al. (2014). And to the firefly Monte Carlo procedure also discussed previously here.

Chapter 2 provides some standard background on computational techniques, Chapter 3 covers MCMC with data subsets, Chapter 4 gives some entries on MCMC with parallel and distributed architectures, Chapter 5 focus on variational solutions, and Chapter 6 is about open questions and challenges.

“Insisting on zero asymptotic bias from Monte Carlo estimates of expectations may leave us swamped in errors from high variance or transient bias.”

One central theme of the paper is the need for approximate solutions, MCMC being perceived as the exact solution. (Somewhat wrongly in the sense that the product of an MCMC is at best an empirical version of the true posterior, hence endowed with a residual and incompressible variation for a given computing budget.) While Chapter 3 stresses the issue of assessing the distance to the true posterior, it does not dwell at all on computing times and budget, which is arguably a much harder problem. Chapter 4 seems to be more aware of this issue since arguing that “a way to use parallel computing resources is to run multiple sequential MCMC algorithms at once [but that this] does not reduce the transient bias in MCMC estimates of posterior expectations” (p.54). The alternatives are to use either prefetching (which was the central theme of Elaine Angelino’s thesis), asynchronous Gibbs with the new to me (?) Hogwild Gibbs algorithms (connected in Terenin et al.’s recent paper, not quoted in the paper), some versions of consensus Monte Carlo covered in earlier posts, the missing links being in my humble opinion an assessment of the worth of those solutions (in the spirit of “here’s the solution, what was the problem again?”) and once again the computing time issue. Chapter 5 briefly discusses some recent developments in variational mean field approximations, which is farther from my interests and (limited) competence, but which appears as a particular class of approximate models and thus could (and should?) relate to likelihood-free methods. Chapter 6 about the current challenges of the field is presumably the most interesting in this monograph in that it produces open questions and suggests directions for future research. For instance, opposing the long term MCMC error with the short term transient part. Or the issue of comparing different implementations in a practical and timely perspective.

Je reviendrai à Montréal [D-2]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on December 9, 2015 by xi'an

I have spent the day and more completing and compiling slides for my contrapuntal perspective on probabilistic numerics, back in Montréal, for the NIPS 2015 workshop of December 11 on this theme. As I presume the kind  invitation by the organisers was connected with my somewhat critical posts on the topic, I mostly  The day after, while I am flying back to London for the CFE (Computational and Financial Econometrics) workshop, somewhat reluctantly as there will be another NIPS workshop that day on scalable Monte Carlo.

Je veux revoir le long désert
Des rues qui n’en finissent pas
Qui vont jusqu’au bout de l’hiver
Sans qu’il y ait trace de pas