**A**n alarming entry in The Guardian about the huge proportion of councils in the UK using machine-learning software to allocate benefits, detect child abuse or claim fraud. And relying blindly on the outcome of such software, despite their well-documented lack of reliability, uncertainty assessments, and warnings. Blindly in the sense that the impact of their (implemented) decision was not even reviewed, even though a portion of the councils does not consider renewing the contracts. With the appalling statement of the CEO of one software company reported in the title. Blaming further the lack of accessibility [for their company] of the data used by the councils for the impossibility [for the company] of providing risk factors and identifying bias, in an unbelievable newspeak inversion… As pointed out by David Spiegelhalter in the article, the openness should go the other way, namely that the algorithms behind the suggestions (read decisions) should be available to understand why these decisions were made. (A whole series of Guardian articles relate to this as well, under the heading “Automating poverty”.)

## Archive for machine learning

## double descent

Posted in Books, Statistics, University life with tags double descent, France, Gare de Lyon, INRIA, machine learning, neural network, Paris, randomisation, Seine, SMILE seminar, stochastic gradient descent, training versus testing on November 7, 2019 by xi'an**L**ast Friday, I [and a few hundred others!] went to the SMILE (Statistical Machine Learning in Paris) seminar where Francis Bach was giving a talk. (With a pleasant ride from Dauphine along the Seine river.) Fancis was talking about the double descent phenomenon observed in recent papers by Belkin & al. (2018, 2019), and Mei & Montanari (2019). (As the seminar room at INRIA was quite crowded and as I was sitting X-legged on the floor close to the screen, I took a few slides from below!) The phenomenon is that the usual U curve warning about over-fitting and reproduced in most statistics and machine-learning courses can under the right circumstances be followed by a second decrease in the testing error when the number of features goes beyond the number of observations. This is rather puzzling and counter-intuitive, so I briefkly checked the 2019 [8 pages] article by Belkin & al., who are studying two examples, including a standard “large p small n” Gaussian regression. where the authors state that

“However, as p grows beyond n, the test risk again decreases, provided that the model is fit using a suitable inductive bias (e.g., least norm solution). “

One explanation [I found after checking the paper] is that the variates (features) in the regression are selected at random rather than in an optimal sequential order. Double descent is missing with interpolating and deterministic estimators. Hence requiring on principle all candidate variates to be included to achieve minimal averaged error. The infinite spike is when the number p of variate is near the number n of observations. (The expectation accounts as well for the randomisation in T. Randomisation that remains an unclear feature in this framework…)

## conditional noise contrastive estimation

Posted in Books, pictures, University life with tags Charlie Geyer, conference, ICML 2018, intractable constant, logistic regression, machine learning, noise contrasting estimation, Stockholm, Sweden on August 13, 2019 by xi'an**A**t ICML last year, Ciwan Ceylan and Michael Gutmann presented a new version of noise constrative estimation to deal with intractable constants. While noise contrastive estimation relies upon a second independent sample to contrast with the observed sample, this approach uses instead a perturbed or noisy version of the original sample, for instance a Normal generation centred at the original datapoint. And eliminates the annoying constant by breaking the (original and noisy) samples into two groups. The probability to belong to one group or the other then does not depend on the constant, which is a very effective trick. And can be optimised with respect to the parameters of the model of interest. Recovering the score matching function of Hyvärinen (2005). While this is in line with earlier papers by Gutmann and Hyvärinen, this line of reasoning (starting with Charlie Geyer’s logistic regression) never ceases to amaze me!

## visualising bias and unbiasedness

Posted in Books, Kids, pictures, R, Statistics, University life with tags bias, cross validated, density estimator, dispersion, machine learning, maximum likelihood estimation, normal model, Pattern Recognition and Machine Learning, plug-in estimator, variability on April 29, 2019 by xi'an**A** question on X validated led me to wonder at the point made by Christopher Bishop in his Pattern Recognition and Machine Learning book about the MLE of the Normal variance being biased. As it is illustrated by the above graph that opposes the true and green distribution of the data (made of two points) against the estimated and red distribution. While it is true that the MLE under-estimates the variance on average, the pictures are cartoonist caricatures in their deviance permanence across three replicas. When looking at 10⁵ replicas, rather than three, and at samples of size 10, rather than 2, the distinction between using the MLE (left) and the unbiased estimator of σ² (right).

When looking more specifically at the case n=2, the humongous variability of the density estimate completely dwarfs the bias issue:

Even when averaging over all 10⁵ replications, the difference is hard to spot (and both estimations are more dispersed than the truth!):

## tenure track position in Clermont, Auvergne

Posted in pictures, Travel, University life with tags academic position, Auvergne, Clermont-Ferrand, deep learning, France, machine learning, Puy de Sancy, Région Centre, Statistics, teaching load, tenure track on April 23, 2019 by xi'an**M**y friend Arnaud Guillin pointed out this opening of a tenure-track professor position at his University of Clermont Auvergne, in Central France. With specialty in statistics and machine-learning, especially deep learning. The deadline for applications is 12 May 2019. (Tenure-track positions are quite rare in French universities and this offer includes a limited teaching load over three years, potential tenure and titularisation at the end of a five year period, and is restricted to candidates who did their PhD or their postdoc abroad.)

## Stein’s method in machine learning [workshop]

Posted in pictures, Running, Statistics, Travel, University life with tags California, Charles Stein, ICML 2019, LA, Long Beach, machine learning, Stein's method, Stein's paradox, University of Warwick, USA, workshop on April 5, 2019 by xi'an**T**here will be an ICML workshop on Stein’s method in machine learning & statistics, next July 14 *or* 15, located in Long Beach, CA. Organised by François-Xavier Briol (formerly Warwick), Lester Mckey, Chris Oates (formerly Warwick), Qiang Liu, and Larry Golstein. To quote from the webpage of the workshop

Stein’s method is a technique from probability theory for bounding the distance between probability measures using differential and difference operators. Although the method was initially designed as a technique for proving central limit theorems, it has recently caught the attention of the machine learning (ML) community and has been used for a variety of practical tasks. Recent applications include goodness-of-fit testing, generative modeling, global non-convex optimisation, variational inference, de novo sampling, constructing powerful control variates for Monte Carlo variance reduction, and measuring the quality of Markov chain Monte Carlo algorithms.

Speakers include Anima Anandkumar, Lawrence Carin, Louis Chen, Andrew Duncan, Arthur Gretton, and Susan Holmes. I am quite sorry to miss two workshops dedicated to Stein’s work in a row, the other one being at NUS, Singapore, around the Stein paradox.

## position in statistics and/or machine learning at ENSAE ParisTech‐CREST

Posted in pictures, University life with tags assistant professor position, associate professor position, CREST, Ecole Nationale de la Statistique et de l'Administration Economique, ENSAE, job offer, machine learning, Paris-Saclay campus on March 28, 2019 by xi'anENSAE ParisTech and CREST are currently inviting applications for a position of Assistant or Associate Professor in Statistics or Machine Learning.

The appointment starts in September, 2019, at the earliest. At the level of Assistant Professor, the position is for an initial three-year term renewable for another three years before the tenure evaluation. Salary is competitive according to qualifications. The teaching duties are reduced compared to French university standards. At the time of appointment, knowledge of French is not required but it is expected that the appointee will acquire a workable knowledge of French within a reasonable time.

**Candidate Profile**

– PhD in Statistics or Machine Learning.

– Outstanding research, including subjects in high-dimensional statistics and machine learning.

– Publications in leading international journals in Statistics or leading outlets in Machine Learning.

Demonstrated ability to teach courses in Mathematics, Statistics and Machine Learning for engineers and to supervise projects in Applied Statistics. The successful candidate is expected to teach at least one course in mathematics, applied mathematics or introductory statistics at the undergraduate level, and one course in the “Data Science, Statistics and Machine Learning”’ specialization track during the third year of ENSAE (Master level).

Applications should submitted (in French or in English) by email to recruitment@ensae.fr :

– Curriculum vitae;

– Statement of research and teaching interests (2-4 pages);

– Names and addresses of three or more individuals willing to provide letters of reference.

Deadline for applications : **April 29, 2019**.

Selected candidates will be invited to present their work and project at ENSAE‐CREST.