Archive for CREST

Bayesian regression trees [seminar]

Posted in pictures, Statistics, University life with tags , , , , , , , , , , on January 26, 2018 by xi'an
During her visit to Paris, Veronika Rockovà (Chicago Booth) will give a talk in ENSAE-CREST on the Saclay Plateau at 2pm. Here is the abstract
Posterior Concentration for Bayesian Regression Trees and Ensembles
(joint with Stephanie van der Pas)Since their inception in the 1980’s, regression trees have been one of the more widely used non-parametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet  susceptible to over-fitting.  Strategies against overfitting have traditionally relied on  pruning  greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior  charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles  has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has  been  missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that  regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p>n. These results  provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.

two Parisian talks by Pierre Jacob in January

Posted in pictures, Statistics, University life with tags , , , , , , , , , on December 21, 2017 by xi'an

While back in Paris from Harvard in early January, Pierre Jacob will give two talks on works of his:

January 09, 10:30, séminaire d’Analyse-Probabilités, Université Paris-Dauphine: Unbiased MCMC

Markov chain Monte Carlo (MCMC) methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, MCMC estimators are generally biased after any fixed number of iterations, which complicates both parallel computation and the construction of confidence intervals. We propose to remove this bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and confidence intervals can be directly constructed from the Central Limit Theorem for i.i.d. variables. We provide practical couplings for important algorithms such as the Metropolis-Hastings and Gibbs samplers. We establish the theoretical validity of the proposed estimators, and study their variances and computational costs. In numerical experiments, including inference in hierarchical models, bimodal or high-dimensional target distributions, logistic regressions with the Pólya-Gamma Gibbs sampler and the Bayesian Lasso, we demonstrate the wide applicability of the proposed methodology as well as its limitations. Finally, we illustrate how the proposed estimators can approximate the “cut” distribution that arises in Bayesian inference for misspecified models.

January 11, 10:30, CREST-ENSAE, Paris-Saclay: Better together? Statistical learning in models made of modules [Warning: Paris-Saclay is not in Paris!]

In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference or decision problem. It is convenient to use a graphical model to represent the statistical dependencies, via a set of connected “modules”, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the update of others. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of “cut distributions”. In this talk, I will discuss why these modular approaches might be preferable to the full model in misspecified settings, and propose principled criteria to choose between modular and full-model approaches. The question is intertwined with computational difficulties associated with the cut distribution, and new approaches based on recently proposed unbiased MCMC methods will be described.

Long enough after the New Year festivities (if any) to be fully operational for them!

Les Rouquins

Posted in pictures, University life, Wines with tags , , , , , , , on October 20, 2017 by xi'an

Langevin on a wrong bend

Posted in Books, Statistics with tags , , , , , , , on October 19, 2017 by xi'an

Arnak Dalayan and Avetik Karagulyan (CREST) arXived a paper the other week on a focussed study of the Langevin algorithm [not MALA] when the gradient of the target is incorrect. With the following improvements [quoting non-verbatim from the paper]:

  1. a varying-step Langevin that reduces the number of iterations for a given Wasserstein precision, compared with recent results by e.g. Alan Durmus and Éric Moulines;
  2. an extension of convergence results for error-prone evaluations of the gradient of the target (i.e., the gradient is replaced with a noisy version, under some moment assumptions that do not include unbiasedness);
  3. a new second-order sampling algorithm termed LMCO’, with improved convergence properties.

What is particularly interesting to me in this setting is the use in all these papers of a discretised Langevin diffusion (a.k.a., random walk with a drift induced by the gradient of the log-target) without the original Metropolis correction. The results rely on an assumption of [strong?] log-concavity of the target, with “user-friendly” bounds on the Wasserstein distance depending on the constants appearing in this log-concavity constraint. And so does the adaptive step. (In the case of the noisy version, the bias and variance of the noise also matter. As pointed out by the authors, there is still applicability to scaling MCMC for large samples. Beyond pseudo-marginal situations.)

“…this, at first sight very disappointing behavior of the LMC algorithm is, in fact, continuously connected to the exponential convergence of the gradient descent.”

The paper concludes with an interesting mise en parallèle of Langevin algorithms and of gradient descent algorithms, since the convergence rates are the same.

probably ABC [and provably robust]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on August 8, 2017 by xi'an

Two weeks ago, James Ridgway (formerly CREST) arXived a paper on misspecification and ABC, a topic on which David Frazier, Judith Rousseau and I have been working for a while now [and soon to be arXived as well].  Paper that I re-read on a flight to Amsterdam [hence the above picture], written as a continuation of our earlier paper with David, Gael, and Judith. One specificity of the paper is to use an exponential distribution on the distance between the observed and simulated sample within the ABC distribution. Which reminds me of the resolution by Bissiri, Holmes, and Walker (2016) of the intractability of the likelihood function. James’ paper contains oracle inequalities between the ABC approximation and the genuine distribution of the summary statistics, like a bound on the distance between the expectations of the summary statistics under both models. Which writes down as a sum of a model bias, of two divergences between empirical and theoretical averages, on smoothness penalties, and on a prior impact term. And a similar bound on the distance between the expected distance to the oracle estimator of θ under the ABC distribution [and a Lipschitz type assumption also found in our paper]. Which first sounded weird [to me] as I would have expected the true posterior, until it dawned on me that the ABC distribution is the one used for the estimation [a passing strike of over-Bayesianism!]. While the oracle bound could have been used directly to discuss the rate of convergence of the exponential rate λ to zero [with the sample size n], James goes into the interesting alternative direction of setting a prior on λ, an idea that dates back to Olivier Catoni and Peter Grünwald. Or rather a pseudo-posterior on λ, a common occurrence in the PAC-Bayesian literature. In one of his results, James obtains a dependence of λ on the dimension m of the summary [as well as the root dependence on the sample size n], which seems to contradict our earlier independence result, until one realises this scale parameter is associated with a distance variable, itself scaled in m.

The paper also contains a non-parametric part, where the parameter θ is the unknown distribution of the data and the summary the data itself. Which is quite surprising as I did not deem it possible to handle non-parametrics with ABC. Especially in a misspecified setting (although I have trouble perceiving what this really means).

“We can use most of the Monte Carlo toolbox available in this context.”

The theoretical parts are a bit heavy on notations and hard to read [as a vacation morning read at least!]. They are followed by a Monte Carlo implementation using SMC-ABC.  And pseudo-marginals [at least formally as I do not see how the specific features of pseudo-marginals are more that an augmented representation here]. And adaptive multiple pseudo-samples that reminded me of the Biometrika paper of Anthony Lee and Krys Latuszynski (Warwick). Therefore using indeed most of the toolbox!

end of a long era [1982-2017]

Posted in Books, pictures, Running, University life with tags , , , , , , , , , , , on May 23, 2017 by xi'an

This afternoon I went to CREST to empty my office there from books and a few papers (like the original manuscript version of Monte Carlo Statistical Methods). This is because the research centre, along with the ENSAE graduate school (my Alma mater), is moving to a new building on the Saclay plateau, next to École Polytechnique. As part of this ambitious migration of engineering schools from downtown Paris to a brand new campus there. Without getting sentimental about this move, it means leaving the INSEE building in Malakoff, on the outskirts of downtown Paris, which has been an enjoyable part of my student and then academic life from 1982 till now. And also leaving the INSEE Paris Club runners! (I am quite uncertain about being as active at the new location, if only because going there by bike is a bit more of a challenge. To be addressed anyway!) And I left behind my accumulation of conference badges (although I should try to recycle them for the incoming BNP 11 in Paris!).

Alan Gelfand in Paris

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on May 11, 2017 by xi'an

Alan Gelfand (Duke University) will be in Paris on the week of May 15 and give several seminars, including one at AgroParisTech on May 16:

Modèles hiérarchiques

and on at CREST (BiPS)  on May 18, 2pm:

Scalable Gaussian processes for analyzing space and space-time datasets