## double descent

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on November 7, 2019 by xi'an

Last Friday, I [and a few hundred others!] went to the SMILE (Statistical Machine Learning in Paris) seminar where Francis Bach was giving a talk. (With a pleasant ride from Dauphine along the Seine river.) Fancis was talking about the double descent phenomenon observed in recent papers by Belkin & al. (2018, 2019), and Mei & Montanari (2019). (As the seminar room at INRIA was quite crowded and as I was sitting X-legged on the floor close to the screen, I took a few slides from below!) The phenomenon is that the usual U curve warning about over-fitting and reproduced in most statistics and machine-learning courses can under the right circumstances be followed by a second decrease in the testing error when the number of features goes beyond the number of observations. This is rather puzzling and counter-intuitive, so I briefkly checked the 2019 [8 pages] article by Belkin & al., who are studying two examples, including a standard “large p small n” Gaussian regression. where the authors state that

“However, as p grows beyond n, the test risk again decreases, provided that the model is fit using a suitable inductive bias (e.g., least norm solution). “

One explanation [I found after checking the paper] is that the variates (features) in the regression are selected at random rather than in an optimal sequential order. Double descent is missing with interpolating and deterministic estimators. Hence requiring on principle all candidate variates to be included to achieve minimal averaged error. The infinite spike is when the number p of variate is near the number n of observations. (The expectation accounts as well for the randomisation in T. Randomisation that remains an unclear feature in this framework…)

## off to New York

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on March 29, 2015 by xi'an

I am off to New York City for two days, giving a seminar at Columbia tomorrow and visiting Andrew Gelman there. My talk will be about testing as mixture estimation, with slides similar to the Nice ones below if slightly upgraded and augmented during the flight to JFK. Looking at the past seminar speakers, I noticed we were three speakers from Paris in the last fortnight, with Ismael Castillo and Paul Doukhan (in the Applied Probability seminar) preceding me. Is there a significant bias there?!

## David Blei smile in Paris (seminar)

Posted in Statistics, Travel, University life with tags , , , , , , , , on October 30, 2013 by xi'an

Nicolas Chopin just reminded me of a seminar given by David Blei in Paris tomorrow (at 4pm, SMILE seminarINRIA 23 avenue d’Italie, 5th floor, orange room) on Stochastic Variational Inference and Scalable Topic Models, machine learning seminar that I will alas miss, being busy on giving mine at CMU. Here is the abstract:

```Probabilistic topic modeling provides a suite of tools for analyzing
large collections of electronic documents.  With a collection as
input, topic modeling algorithms uncover its underlying themes and
decompose its documents according to those themes.  We can use topic
models to explore the thematic structure of a large collection of
documents or to solve a variety of prediction problems about text.

Topic models are based on hierarchical mixed-membership models,
statistical models where each document expresses a set of components
(called topics) with individual per-document proportions. The
computational problem is to condition on a collection of observed
documents and estimate the posterior distribution of the topics and
per-document proportions. In modern data sets, this amounts to
posterior inference with billions of latent variables.

How can we cope with such data?  In this talk I will describe
stochastic variational inference, a general algorithm for
approximating posterior distributions that are conditioned on massive
data sets.  Stochastic inference is easily applied to a large class of
hierarchical models, including time-series models, factor models, and
Bayesian nonparametric models.  I will demonstrate its application to
topic models fit with millions of articles.  Stochastic inference
opens the door to scalable Bayesian computation for modern data```