Archive for sparsity

oceanographers in Les Houches

Posted in Books, Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on March 9, 2024 by xi'an

ridge6

Þe first internal research workshop of our ERC Synergy project OCEAN is taking place in Les Houches, French Alps, this coming week with 15 researchers gathering for brain-storming on some of the themes at the core of the project, like algorithmic tools for multiple decision-making agents, along with Bayesian uncertainty quantification and Bayesian learning under constraints (scarcity, fairness, privacy). Due to the small size of the workshop (which is perfect for engaging into joint work), it could not be housed by the nearby, iconic, École de Physique des Houches but will take place instead in a local hotel.

On the leisurely side, I hope there will be enough snow left for some lunch-time ski breaks [with no bone fracture à la Adapski!] Or, else, that the running trails nearby will prove manageable.

Mostly Monte Carlo Xminas

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , , , , , , on December 7, 2023 by xi'an

The next and last of 2023 occurrence of our monthly series of Parisian seminars on the theory and practice of Monte Carlo in statistics and data science, in conjunction with our ERC OCEAN project , will be on Friday 15 December. The next seminars will be on 15 January, 12 February, and 98 March.

4pm/16h CEST: SVBMC: Fast post-processing Bayesian inference with noisy evaluations of the likelihood

Grégoire Clarté – University of Helsinki, University of Edinburgh

In many cases, the exact likelihood is unavailable, and can only be accessed through a noisy and expensive process – for example, in Plasma Physics. Furthermore, Bayesian inference often comes in at a second moment, for example after running an optimization algorithm to find a MAP estimate. To tackle both these issues, we introduce Sparse Variational Bayesian Monte Carlo (SVBMC), a method for fast “post-processes” Bayesian inference for models with black-box and noisy likelihoods. SVBMC reuses all existing target density evaluations – for example, from previous optimizations or partial Markov Chain Monte Carlo runs – to build a sparse Gaussian process (GP) surrogate model of the log posterior density. Uncertain regions of the surrogate are then refined via active learning as needed. Our work builds on the Variational Bayesian Monte Carlo (VBMC) framework for sample-efficient inference, with several novel contributions. First, we make VBMC scalable to a large number of pre-existing evaluations via sparse GP regression, deriving novel Bayesian quadrature formulae and acquisition functions for active learning with sparse GPs. Second, we introduce noise shaping, a general technique to induce the sparse GP approximation to focus on high posterior density regions. Third, we prove theoretical results in support of the SVBMC refinement procedure. We validate our method on a variety of challenging synthetic scenarios and real-world applications. We find that SVBMC consistently builds good posterior approximations by post-processing of existing model evaluations from different sources, often requiring only a small number of additional density evaluations.

5pm/17h CEST: Variance reduction using control variates and importance sampling for applications in computational statistical physics

Urbain Vaes – INRIA, CERMICS

The scaling of the mobility coefficient associated with two-dimensional Langevin dynamics in a periodic potential as the friction vanishes is not well understood. Theoretical results are lacking, and numerical calculation of the mobility in the underdamped regime is challenging. In the first part of this talk, I will present a new variance reduction approach based on control variates for efficiently estimating the mobility of Langevin-type dynamics, together with numerical experiments illustrating the performance of the approach.

In the second part of this talk, we study an importance sampling approach for calculating averages with respect to multimodal probability distributions. Traditional Markov chain Monte Carlo methods to this end, which are based on time averages along a realization of a Markov process ergodic with respect to the target probability distribution, are usually plagued by a large variance due to the metastability of the process. The estimator we study is based on an ergodic average along a realization of an overdamped Langevin process for a modified potential. We obtain an explicit expression for the optimal biasing potential in dimension 1 and propose a general numerical approach for approximating the optimal potential in the multi-dimensional setting.

optimal choice among MCMC kernels

Posted in Statistics with tags , , , , , , , , , , on March 14, 2019 by xi'an

Last week in Siem Reap, Florian Maire [who I discovered originates from a Norman town less than 10km from my hometown!] presented an arXived joint work with Pierre Vandekerkhove at the Data Science & Finance conference in Cambodia that considers the following problem: Given a large collection of MCMC kernels, how to pick the best one and how to define what best means. Going by mixtures is a default exploration of the collection, as shown in (Tierney) 1994 for instance since this improves on both kernels (esp. when each kernel is not irreducible on its own!). This paper considers a move to local weights in the mixture, weights that are not estimated from earlier simulations, contrary to what I first understood.

As made clearer in the paper the focus is on filamentary distributions that are concentrated nearby lower-dimension sets or manifolds Since then the components of the kernel collections can be restricted to directions of these manifolds… Including an interesting case of a 2-D highly peaked target where converging means mostly simulating in x¹ and covering the target means mostly simulating in x². Exhibiting a schizophrenic tension between the two goals. Weight locally dependent means correction by Metropolis step, with cost O(n). What of Rao-Blackwellisation of these mixture weights, from weight x transition to full mixture, as in our PMC paper? Unclear to me as well [during the talk] is the use in the mixture of basic Metropolis kernels, which are not absolutely continuous, because of the Dirac mass component. But this is clarified by Section 5 in the paper. A surprising result from the paper (Corollary 1) is that the use of local weights ω(i,x) that depend on the current value of the chain does jeopardize the stationary measure π(.) of the mixture chain. Which may be due to the fact that all components of the mixture are already π-invariant. Or that the index of the kernel constitutes an auxiliary (if ancillary)  variate. (Algorithm 1 in the paper reminds me of delayed acceptance. Making me wonder if computing time should be accounted for.) A final question I briefly discussed with Florian is the extension to weights that are automatically constructed from the simulations and the target.

JSM 2018 [#1]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , on July 30, 2018 by xi'an

As our direct flight from Paris landed in the morning in Vancouver,  we found ourselves in the unusual situation of a few hours to kill before accessing our rental and where else better than a general introduction to deep learning in the first round of sessions at JSM2018?! In my humble opinion, or maybe just because it was past midnight in Paris time!, the talk was pretty uninspiring in missing the natural question of the possible connections between the construction of a prediction function and statistics. Watching improving performances at classifying human faces does not tell much more than creating a massively non-linear function in high dimensions with nicely designed error penalties. Most of the talk droned about neural networks and their fitting by back-propagation and the variations on stochastic gradient descent. Not addressing much rather natural (?) questions about choice of functions at each level, of the number of levels, of the penalty term, or regulariser, and even less the reason why no sparsity is imposed on the structure, despite the humongous number of parameters involved. What came close [but not that close] to sparsity is the notion of dropout, which is a sort of purely automated culling of the nodes, and which was new to me. More like a sort of randomisation that turns the optimisation criterion in an average. Only at the end of the presentation more relevant questions emerged, presenting unsupervised learning as density estimation, the pivot being the generative features of (most) statistical models. And GANs of course. But nonetheless missing an explanation as to why models with massive numbers of parameters can be considered in this setting and not in standard statistics. (One slide about deterministic auto-encoders was somewhat puzzling in that it seemed to repeat the “fiducial mistake”.)

expectation-propagation from Les Houches

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , , , , , , on February 3, 2016 by xi'an

ridge6As CHANCE book editor, I received the other day from Oxford University Press acts from an École de Physique des Houches on Statistical Physics, Optimisation, Inference, and Message-Passing Algorithms that took place there in September 30 – October 11, 2013.  While it is mostly unrelated with Statistics, and since Igor Caron already reviewed the book a year and more ago, I skimmed through the few chapters connected to my interest, from Devavrat Shah’s chapter on graphical models and belief propagation, to Andrea Montanari‘s denoising and sparse regression, including LASSO, and only read in some detail Manfred Opper’s expectation propagation chapter. This paper made me realise (or re-realise as I had presumably forgotten an earlier explanation!) that expectation propagation can be seen as a sort of variational approximation that produces by a sequence of iterations the distribution within a certain parametric (exponential) family that is the closest to the distribution of interest. By writing the Kullback-Leibler divergence the opposite way from the usual variational approximation, the solution equates the expectation of the natural sufficient statistic under both models… Another interesting aspect of this chapter is the connection with estimating normalising constants. (I noticed a slight typo on p.269 in the final form of the Kullback approximation q() to p().