Archive for INRIA

ABC in… everywhere [programme]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 8, 2021 by xi'an

The ABC in Svalbard workshop is taking place on-line next week (and most sadly not in Svalbard). The programme is available on the ABC site. It starts (in Australia) at 4:00GMT (14 AEST) and finishes (in France) at 15:30GMT (17:30 CET). Registration is free but needed to access the Zoom codes!  See you on Zoom next week!!!

missing bit?

Posted in Books, Statistics, University life with tags , , , , , , , , on January 9, 2021 by xi'an

Nature of 7 December 2020 has a Nature Index (a supplement made of a series of articles, more journalistic than scientific, with corporate backup, which “have no influence over the content”) on Artificial Intelligence, including the above graph representing “the top 200 collaborations among 146 institutions based between 2015 and 2019, sized according to each institution’s share in artificial intelligence”, with only the UK, Germany, Switzerland and Italy identified for Europe… Missing e.g. the output from France and from its major computer science institute, INRIA. Maybe because “the articles picked up by [their] database search concern specific applications of AI in the life sciences, physical sciences, chemistry, and Earth and environmental sciences”.  Or maybe because of the identification of INRIA as such.

“Access to massive data sets on which to train machine-learning systems is one advantage that both the US and China have. Europe, on the other hand, has stringent data laws, which protect people’s privacy, but limit its resources for training AI algorithms. So, it seems unlikely that Europe will produce very sophisticated AI as a consequence”

This comment is sort of contradictory for the attached articles calling for a more ethical AI. Like making AI more transparent and robust. While having unrestricted access to personal is helping with social engineering and control favoured by dictatures and corporate behemoths, a culture of data privacy may (and should) lead to develop new methodology to work with protected data (as in an Alan Turing Institute project) and to infuse more trust from the public. Working with less data does not mean less sophistication in handling it but on the opposite! Another clash of events appears in one of the six trailblazers portrayed in the special supplement being Timnit Gebru, “former co-lead of the Ethical AI Team at Google”, who parted way with Google at the time the issue was published. (See Andrew’s blog for  discussion of her firing. And the MIT Technology Review for an analysis of the paper potentially at the source of it.)

Francis Bach à l’Académie des Sciences

Posted in Statistics with tags , , , , , on April 8, 2020 by xi'an

Congrats to Francis Bach, freshly nominated to the French Academy of Sciences, joining Stéphane Mallat²⁰¹⁴ and Éric Moulines²⁰¹⁷ as data science academicians!

double descent

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on November 7, 2019 by xi'an

Last Friday, I [and a few hundred others!] went to the SMILE (Statistical Machine Learning in Paris) seminar where Francis Bach was giving a talk. (With a pleasant ride from Dauphine along the Seine river.) Fancis was talking about the double descent phenomenon observed in recent papers by Belkin & al. (2018, 2019), and Mei & Montanari (2019). (As the seminar room at INRIA was quite crowded and as I was sitting X-legged on the floor close to the screen, I took a few slides from below!) The phenomenon is that the usual U curve warning about over-fitting and reproduced in most statistics and machine-learning courses can under the right circumstances be followed by a second decrease in the testing error when the number of features goes beyond the number of observations. This is rather puzzling and counter-intuitive, so I briefkly checked the 2019 [8 pages] article by Belkin & al., who are studying two examples, including a standard “large p small n” Gaussian regression. where the authors state that

“However, as p grows beyond n, the test risk again decreases, provided that the model is fit using a suitable inductive bias (e.g., least norm solution). “

One explanation [I found after checking the paper] is that the variates (features) in the regression are selected at random rather than in an optimal sequential order. Double descent is missing with interpolating and deterministic estimators. Hence requiring on principle all candidate variates to be included to achieve minimal averaged error. The infinite spike is when the number p of variate is near the number n of observations. (The expectation accounts as well for the randomisation in T. Randomisation that remains an unclear feature in this framework…)

postdoctoral position in computational statistical physics and machine learning

Posted in Statistics with tags , , , , , , , , on February 12, 2019 by xi'an