Archive for scalability
at the Isaac Newton Institute
Posted in Statistics, Travel, University life with tags Cambridge University, England, INI, Isaac Newton Institute, scalability, scalable MCMC on July 6, 2017 by xi'anMCM 2017
Posted in pictures, Statistics, Travel, University life with tags Approximate Bayesian computation, Canada, MCMC, Monte Carlo integration, Monte Carlo Statistical Methods, Montréal, probabilistic numerics, Québec, Robert Charlebois, scalability, stochastic gradient on February 10, 2017 by xi'anJe reviendrai à Montréal, as the song by Robert Charlebois goes, for the MCM 2017 meeting there, on July 3-7. I was invited to give a plenary talk by the organisers of the conference . Along with
Steffen Dereich, WWU Münster, Germany
Paul Dupuis, Brown University, Providence, USA
Mark Girolami, Imperial College London, UK
Emmanuel Gobet, École Polytechnique, Palaiseau, France
Aicke Hinrichs, Johannes Kepler University, Linz, Austria
Alexander Keller, NVIDIA Research, Germany
Gunther Leobacher, Johannes Kepler University, Linz, Austria
Art B. Owen, Stanford University, USA
Note that, while special sessions are already selected, including oneon Stochastic Gradient methods for Monte Carlo and Variational Inference, organised by Victor Elvira and Ingmar Schuster (my only contribution to this session being the suggestion they organise it!), proposals for contributed talks will be selected based on one-page abstracts, to be submitted by March 1.
communication-efficient distributed statistical learning
Posted in Books, Statistics, University life with tags approximate likelihood, big data, distributed computing, logistic regression, M-estimation, MCMC, scalability, tall data, Taylor expansion on June 10, 2016 by xi'anMichael Jordan, Jason Lee, and Yun Yang just arXived a paper with their proposal on handling large datasets through distributed computing, thus contributing to the currently very active research topic of approximate solutions in large Bayesian models. The core of the proposal is summarised by the screenshot above, where the approximate likelihood replaces the exact likelihood with a first order Taylor expansion. The first term is the likelihood computed for a given subsample (or a given thread) at a ratio of one to N and the difference of the gradients is only computed once at a good enough guess. While the paper also considers M-estimators and non-Bayesian settings, the Bayesian part thus consists in running a regular MCMC when the log-target is approximated by the above. I first thought this proposal amounted to a Gaussian approximation à la Simon Wood or to an INLA approach but this is not the case: the first term of the approximate likelihood is exact and hence can be of any form, while the scalar product is linear in θ, providing a sort of first order approximation, albeit frozen at the chosen starting value.
Assuming that each block of the dataset is stored on a separate machine, I think the approach could further be implemented in parallel, running N MCMC chains and comparing the output. With a post-simulation summary stemming from the N empirical distributions thus produced. I also wonder how the method would perform outside the fairly smooth logistic regression case, where the single sample captures well-enough the target. The picture above shows a minor gain in a misclassification rate that is already essentially zero.
merging MCMC subposteriors
Posted in Books, Statistics, University life with tags computing cost, Gaussian processes, importance sampling, log-normal distribution, Monte Carlo approximations, Monte Carlo Statistical Methods, scalability, sub-posteriors on June 8, 2016 by xi'anChristopher Nemeth and Chris Sherlock arXived a paper yesterday about an approach to distributed MCMC sampling via Gaussian processes. As in several other papers commented on the ‘Og, the issue is to merge MCMC samples from sub-posteriors into a sample or any sort of approximation of the complete (product) posterior. I am quite sympathetic to the approach adopted in this paper, namely to use a log-Gaussian process representation of each sub-posterior and then to replace each sub-posterior with its log-Gaussian process posterior expectation in an MCMC or importance scheme. And to assess its variability through the posterior variance of the sum of log-Gaussian processes. As pointed out by the authors the closed form representation of the posterior mean of the log-posterior is invaluable as it allows for an HMC implementation. And importance solutions as well. The probabilistic numerics behind this perspective are also highly relevant.
A few arguable (?) points:
- The method often relies on importance sampling and hence on the choice of an importance function that is most likely influential but delicate to calibrate in complex settings as I presume the Gaussian estimates are not useful in this regard;
- Using Monte Carlo to approximate the value of the approximate density at a given parameter value (by simulating from the posterior distribution) is natural but is it that efficient?
- It could be that, by treating all sub-posterior samples as noisy versions of the same (true) posterior, a more accurate approximation of this posterior could be constructed;
- The method relies on the exponentiation of a posterior expectation or simulation. As of yesterday, I am somehow wary of log-normal expectations!
- If the purpose of the exercise is to approximate univariate integrals, it would seem more profitable to use the Gaussian processes at the univariate level;
- The way the normalising missing constants and the duplicate simulations are processed (or not) could deserve further exploration;
- Computing costs are in fine unclear when compared with the other methods in the toolbox.