Archive for ENSAE

controlled sequential Monte Carlo [BiPS seminar]

Posted in Statistics with tags , , , , , , , on June 5, 2018 by xi'an

The last BiPS seminar of the semester will be given by Jeremy Heng (Harvard) on Monday 11 June at 2pm, in room 3001, ENSAE, Paris-Saclay about his Controlled sequential Monte Carlo paper:

Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques to approximate high-dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics and related fields as they can be applied to perform state estimation for non-linear non-Gaussian state space models and Bayesian inference for complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which have a crucial impact on their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. We provide theoretical analysis of our proposed methodology and demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications.

rage against the [Nature] Machine [Intelligence]

Posted in Books, Statistics, University life with tags , , , , , , , , , on May 15, 2018 by xi'an

Yesterday evening, my friend and colleague Pierre Alquier (CREST-ENSAE) got interviewed (for a few seconds on-line!, around minute 06) by the French national radio, France Culture, about the recent call to boycott the incoming Nature Machine Intelligence electronic journal. Call to the machine learning community, based on the lack of paying journals among the major machine learnings journals, like JMLR. Meaning that related conferences like AISTATS and NIPS also get their accepted papers available on-line for free. As noted in the call

“Machine learning has been at the forefront of the movement for free and open access to research. For example, in 2001 the Editorial Board of the Machine Learning Journal resigned en masse to form a new zero-cost open access journal, the Journal of Machine Learning Research (JMLR).”

Bayesian regression trees [seminar]

Posted in pictures, Statistics, University life with tags , , , , , , , , , , on January 26, 2018 by xi'an
During her visit to Paris, Veronika Rockovà (Chicago Booth) will give a talk in ENSAE-CREST on the Saclay Plateau at 2pm. Here is the abstract
Posterior Concentration for Bayesian Regression Trees and Ensembles
(joint with Stephanie van der Pas)Since their inception in the 1980’s, regression trees have been one of the more widely used non-parametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet  susceptible to over-fitting.  Strategies against overfitting have traditionally relied on  pruning  greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior  charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles  has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has  been  missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that  regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p>n. These results  provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.

two Parisian talks by Pierre Jacob in January

Posted in pictures, Statistics, University life with tags , , , , , , , , , on December 21, 2017 by xi'an

While back in Paris from Harvard in early January, Pierre Jacob will give two talks on works of his:

January 09, 10:30, séminaire d’Analyse-Probabilités, Université Paris-Dauphine: Unbiased MCMC

Markov chain Monte Carlo (MCMC) methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, MCMC estimators are generally biased after any fixed number of iterations, which complicates both parallel computation and the construction of confidence intervals. We propose to remove this bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and confidence intervals can be directly constructed from the Central Limit Theorem for i.i.d. variables. We provide practical couplings for important algorithms such as the Metropolis-Hastings and Gibbs samplers. We establish the theoretical validity of the proposed estimators, and study their variances and computational costs. In numerical experiments, including inference in hierarchical models, bimodal or high-dimensional target distributions, logistic regressions with the Pólya-Gamma Gibbs sampler and the Bayesian Lasso, we demonstrate the wide applicability of the proposed methodology as well as its limitations. Finally, we illustrate how the proposed estimators can approximate the “cut” distribution that arises in Bayesian inference for misspecified models.

January 11, 10:30, CREST-ENSAE, Paris-Saclay: Better together? Statistical learning in models made of modules [Warning: Paris-Saclay is not in Paris!]

In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference or decision problem. It is convenient to use a graphical model to represent the statistical dependencies, via a set of connected “modules”, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the update of others. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of “cut distributions”. In this talk, I will discuss why these modular approaches might be preferable to the full model in misspecified settings, and propose principled criteria to choose between modular and full-model approaches. The question is intertwined with computational difficulties associated with the cut distribution, and new approaches based on recently proposed unbiased MCMC methods will be described.

Long enough after the New Year festivities (if any) to be fully operational for them!

Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution?

Posted in Statistics with tags , , , , , , , , on October 27, 2017 by xi'an

I found this question on X validated somewhat hilarious, the more because of the shouted KNOW! And the confused impression that because one can write down π(θ|x) up to a constant, one KNOWS this distribution… It is actually one of the paradoxes of simulation that, from a mathematical perspective, once π(θ|x) is available as a function of (θ,x), all other quantities related with this distribution are mathematically perfectly and uniquely defined. From a numerical perspective, this does not help. Actually, when starting my MCMC course at ENSAE a few days later, I had the same question from a student who thought facing a density function like

f(x) ∞ exp{-||x||²-||x||⁴-||x||⁶}

was enough to immediately produce simulations from this distribution. (I also used this example to show the degeneracy of accept-reject as the dimension d of x increases, using for instance a Gamma proposal on y=||x||. The acceptance probability plunges to zero with d, with 9 acceptances out of 10⁷ for d=20.)

a position served on a plate [ENSAE ParisTech, France]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on October 6, 2017 by xi'an

Nicolas Chopin emailed me about the opening of a position of Professor of Statistics at [my alma mater] ENSAE, on the Paris-Saclay campus [and plateau], next to Polytechnique, Telecom, and a bunch of other engineer schools [a.k.a The French MIT!]. The largest concentration of Science majors in France, definitely to be considered for a posiiton in France! Deadline is quite soon, as November 1. [Pardon my French: The pun in the title sort of fizzled out in translation because served on a plate is equivalent to served on a plateau in French.]