**O**ne of my students in my MCMC course at ENSAE seems to specialise into spotting typos in the Monte Carlo Statistical Methods book as he found an issue in every problem he solved! He even went back to a 1991 paper of mine on Inverse Normal distributions, inspired from a discussion with an astronomer, Caroline Soubiran, and my two colleagues, Gilles Celeux and Jean Diebolt. The above derivation from the massive Gradsteyn and Ryzhik (which I discovered thanks to Mary Ellen Bock when arriving in Purdue) is indeed incorrect as the final term should be the square root of 2β rather than 8β. However, this typo does not impact the normalising constant of the density, K(α,μ,τ), unless I am further confused.

## Archive for ENSAE

## I thought I did make a mistake but I was wrong…

Posted in Books, Kids, Statistics with tags Charles M. Schulz, confluent hypergeometric function, course, ENSAE, exercises, Gradsteyn, inverse normal distribution, MCMC, mixtures, Monte Carlo Statistical Methods, Peanuts, Ryzhik, typos on November 14, 2018 by xi'an## controlled sequential Monte Carlo [BiPS seminar]

Posted in Statistics with tags BiPS, ENSAE, Harvard University, Monte Carlo Statistical Methods, Paris-Saclay campus, seminar, sequential Monte Carlo, SMC on June 5, 2018 by xi'an**T**he last BiPS seminar of the semester will be given by Jeremy Heng (Harvard) on Monday 11 June at 2pm, in room 3001, ENSAE, Paris-Saclay about his Controlled sequential Monte Carlo paper:

Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques to approximate high-dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics and related fields as they can be applied to perform state estimation for non-linear non-Gaussian state space models and Bayesian inference for complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which have a crucial impact on their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. We provide theoretical analysis of our proposed methodology and demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications.

## rage against the [Nature] Machine [Intelligence]

Posted in Books, Statistics, University life with tags academic journals, AISTATS, CREST, ENSAE, JMLR, machine learning, Nature, Nature Machine Intelligence, NIPS, rage against the machine on May 15, 2018 by xi'an**Y**esterday evening, my friend and colleague Pierre Alquier (CREST-ENSAE) got interviewed (for a few seconds on-line!, around minute 06) by the French national radio, France Culture, about the recent call to boycott the incoming Nature Machine Intelligence electronic journal. Call to the machine learning community, based on the lack of paying journals among the major machine learnings journals, like JMLR. Meaning that related conferences like AISTATS and NIPS also get their accepted papers available on-line for free. As noted in the call

“Machine learning has been at the forefront of the movement for free and open access to research. For example, in 2001 the Editorial Board of the

Machine LearningJournalresigned en masse to form a new zero-cost open access journal, theJournal of Machine Learning Research(JMLR).”

## Bayesian regression trees [seminar]

Posted in pictures, Statistics, University life with tags Bayesian CART, Bayesian inference, CREST, ENSAE, overfitting, Paris-Saclay campus, random histogram, regression trees, seminar, talk, tree on January 26, 2018 by xi'an**D**uring her visit to Paris, Veronika Rockovà (Chicago Booth) will give a talk in ENSAE-CREST on the Saclay Plateau at 2pm. Here is the abstract

**Posterior Concentration for Bayesian Regression Trees and Ensembles**

(joint with Stephanie van der Pas)Since their inception in the 1980’s, regression trees have been one of the more widely used non-parametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to over-fitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has been missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p>n. These results provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.

## two Parisian talks by Pierre Jacob in January

Posted in pictures, Statistics, University life with tags coupling, CREST, cut models, ENSAE, Gibbs sampling, MCMC, Paris-Saclay campus, Pierre Jacob, prior construction, Université Paris Dauphine on December 21, 2017 by xi'an**W**hile back in Paris from Harvard in early January, Pierre Jacob will give two talks on works of his:

January 09, 10:30, séminaire d’Analyse-Probabilités, Université Paris-Dauphine: Unbiased MCMC

*Markov chain Monte Carlo (MCMC) methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, MCMC estimators are generally biased after any fixed number of iterations, which complicates both parallel computation and the construction of confidence intervals. We propose to remove this bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and confidence intervals can be directly constructed from the Central Limit Theorem for i.i.d. variables. We provide practical couplings for important algorithms such as the Metropolis-Hastings and Gibbs samplers. We establish the theoretical validity of the proposed estimators, and study their variances and computational costs. In numerical experiments, including inference in hierarchical models, bimodal or high-dimensional target distributions, logistic regressions with the Pólya-Gamma Gibbs sampler and the Bayesian Lasso, we demonstrate the wide applicability of the proposed methodology as well as its limitations. Finally, we illustrate how the proposed estimators can approximate the “cut” distribution that arises in Bayesian inference for misspecified models. *

January 11, 10:30, CREST-ENSAE, Paris-Saclay: Better together? Statistical learning in models made of modules *[Warning: Paris-Saclay is not in Paris!]*

*In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference or decision problem. It is convenient to use a graphical model to represent the statistical dependencies, via a set of connected “modules”, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the update of others. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of “cut distributions”. In this talk, I will discuss why these modular approaches might be preferable to the full model in misspecified settings, and propose principled criteria to choose between modular and full-model approaches. The question is intertwined with computational difficulties associated with the cut distribution, and new approaches based on recently proposed unbiased MCMC methods will be described*.

Long enough after the New Year festivities (if any) to be fully operational for them!

## the Norse farce [amazoning]

Posted in Statistics with tags Amazon, amazon associates, backpacking, biking, ENSAE, logo, mug, North Face, puppies, werewolf on November 30, 2017 by xi'an*[Warning: This (silly) post is completely unrelated with the previous one!]*

**L**ast week, when biking back from ENSAE on the Plateau de Saclay, one strap of my (recent, grey, handy) North Face backpack gave way and I barely avoided a crash! The stitching at the top had unravelled, it appears, and there was no way I could fix it myself. Once safe at home, I thus called Amazon from whom I had bought it less than a year ago and was quite impressed by their handling of the matter, as they reimbursed me almost instantaneously. (Which brings the customary reminder and warning that my Amazon links are connected to my Amazon Associate account and [modest] benefits.) Anyway, this led me to concretise a pun I thought of a long while ago, namely to build a pastiche of the North Face logo where the Half-Dome representation would be turned into the half of a Viking helmet… My daughter Rachel helped me to turn this into an actual output, as shown above, but I would welcome proposals of alternative designs. The winner (if any) gets a free mug with the selected design!

## Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution?

Posted in Statistics with tags accept-reject algorithm, course, cross validated, ENSAE, MCMC, Monte Carlo Statistical Methods, posterior distribution, probability, simulation on October 27, 2017 by xi'an**I** found this question on X validated somewhat hilarious, the more because of the shouted KNOW! And the confused impression that because one can write down π(θ|x) up to a constant, one KNOWS this distribution… It is actually one of the paradoxes of simulation that, from a mathematical perspective, once π(θ|x) is available as a function of (θ,x), all other quantities related with this distribution are mathematically perfectly and uniquely defined. From a numerical perspective, this does not help. Actually, when starting my MCMC course at ENSAE a few days later, I had the same question from a student who thought facing a density function like

f(x) ∞ exp{-||x||²-||x||⁴-||x||⁶}

was enough to immediately produce simulations from this distribution. (I also used this example to show the degeneracy of accept-reject as the dimension d of x increases, using for instance a Gamma proposal on y=||x||. The acceptance probability plunges to zero with d, with 9 acceptances out of 10⁷ for d=20.)