Archive for machine learning

off to BayesComp 20, Gainesville

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on January 7, 2020 by xi'an

postdoc in Bayesian machine learning in Berlin [reposted]

Posted in R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 24, 2019 by xi'an

The working group of Statistics at Humboldt University of Berlin invites applications for one Postdoctoral research fellow (full-time employment, 3 years with extension possible) to contribute to the research on mathematical and statistical aspects of (Bayesian) learning approaches. The research positions are associated with the Emmy Noether group Regression Models beyond the Mean – A Bayesian Approach to Machine Learning and working group of Applied Statistics at the School of Business and Economics at Humboldt-Universität Berlin. Opportunities for own scientific qualification (PhD)/career development are provided, see an overview and further links. The positions are to be filled at the earliest possible date and funded by the German Research Foundation (DFG) within the Emmy Noether programme.

Requirements:
– an outstanding PhD in Statistics, Mathematics, or related field with specialisation in Statistics, Data Science or Mathematics;
– a strong background in at least one of the following fields: mathematical statistics, computational methods, Bayesian statistics, statistical learning, advanced regression modelling;
– a thorough mathematical understanding.
– substantial experience in scientific programming with Matlab, Python, C/C++, R or similar;
– strong interest in developing novel statistical methodology and its applications in various fields such as economics or natural and life sciences;
– a very good communication skills and team experience, proficiency of the written and spoken English language (German is not obligatory).

Opportunities:
We offer the unique environment of young researchers and leading international experts in the fields. The vibrant international network includes established collaborations in Singapore and Australia. The positions offer potential to closely work with several applied sciences. Information about the research profile of the research group and further contact details can be found here. The positions are paid according to the Civil Service rates of the German States “TV-L”, E13 (if suitably qualified).

Applications should include:
– a CV with list of publications
– a motivational statement (at most one page) explaining the applicant’s interest in the announced position as well as their relevant skills and experience
– copies of degrees/university transcripts
– names and email addresses of at least two professors that may provide letters of recommendation directly to the hiring committee Applications should be sent as a single PDF file to: Prof. Dr. Nadja Klein (nadja.klein[at]hu-berlin.de), whom you may also contact for questions concerning this job post. Please indicate “Research Position Emmy Noether”.

Application deadline: 31st of January 2020

HU is seeking to increase the proportion of women in research and teaching, and specifically encourages qualified female scholars to apply. Severely disabled applicants with equivalent qualifications will be given preferential consideration. People with an immigration background are specifically encouraged to apply. Since we will not return your documents, please submit copies in the application only.

no dichotomy between efficiency and interpretability

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 18, 2019 by xi'an

“…there are actually a lot of applications where people do not try to construct an interpretable model, because they might believe that for a complex data set, an interpretable model could not possibly be as accurate as a black box. Or perhaps they want to preserve the model as proprietary.”

One article I found quite interesting in the second issue of HDSR is “Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition” by Cynthia Rudin and Joanna Radin, which describes the setting of a NeurIPS competition last year, the Explainable Machine Learning Challenge, of which I was blissfully unaware. The goal was to construct an operational black box predictor fpr credit scoring and turn it into something interpretable. The authors explain how they built instead a white box predictor (my terms!), namely a linear model, which could not be improved more than marginally by a black box algorithm. (It appears from the references that these authors have a record of analysing black-box models in various setting and demonstrating that they do not always bring more efficiency than interpretable versions.) While this is but one example and even though the authors did not win the challenge (I am unclear why as I did not check the background story, writing on the plane to pre-NeuriPS 2019).

I find this column quite refreshing and worth disseminating, as it challenges the current creed that intractable functions with hundreds of parameters will always do better, if only because they are calibrated within the box and have eventually difficulties to fight over-fitting within (and hence under-fitting outside). This is also a difficulty with common statistical models, but having the ability to construct error evaluations that show how quickly the prediction efficiency deteriorates may prove the more structured and more sparsely parameterised models the winner (of real world competitions).

we have never been unable to develop a reliable predictive model

Posted in Statistics with tags , , , , , , , , , , , , , , , on November 10, 2019 by xi'an

An alarming entry in The Guardian about the huge proportion of councils in the UK using machine-learning software to allocate benefits, detect child abuse or claim fraud. And relying blindly on the outcome of such software, despite their well-documented lack of reliability, uncertainty assessments, and warnings. Blindly in the sense that the impact of their (implemented) decision was not even reviewed, even though a portion of the councils does not consider renewing the contracts. With the appalling statement of the CEO of one software company reported in the title. Blaming further the lack of accessibility [for their company] of the data used by the councils for the impossibility [for the company] of providing risk factors and identifying bias, in an unbelievable newspeak inversion… As pointed out by David Spiegelhalter in the article, the openness should go the other way, namely that the algorithms behind the suggestions (read decisions) should be available to understand why these decisions were made. (A whole series of Guardian articles relate to this as well, under the heading “Automating poverty”.)

double descent

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on November 7, 2019 by xi'an

Last Friday, I [and a few hundred others!] went to the SMILE (Statistical Machine Learning in Paris) seminar where Francis Bach was giving a talk. (With a pleasant ride from Dauphine along the Seine river.) Fancis was talking about the double descent phenomenon observed in recent papers by Belkin & al. (2018, 2019), and Mei & Montanari (2019). (As the seminar room at INRIA was quite crowded and as I was sitting X-legged on the floor close to the screen, I took a few slides from below!) The phenomenon is that the usual U curve warning about over-fitting and reproduced in most statistics and machine-learning courses can under the right circumstances be followed by a second decrease in the testing error when the number of features goes beyond the number of observations. This is rather puzzling and counter-intuitive, so I briefkly checked the 2019 [8 pages] article by Belkin & al., who are studying two examples, including a standard “large p small n” Gaussian regression. where the authors state that

“However, as p grows beyond n, the test risk again decreases, provided that the model is fit using a suitable inductive bias (e.g., least norm solution). “

One explanation [I found after checking the paper] is that the variates (features) in the regression are selected at random rather than in an optimal sequential order. Double descent is missing with interpolating and deterministic estimators. Hence requiring on principle all candidate variates to be included to achieve minimal averaged error. The infinite spike is when the number p of variate is near the number n of observations. (The expectation accounts as well for the randomisation in T. Randomisation that remains an unclear feature in this framework…)

conditional noise contrastive estimation

Posted in Books, pictures, University life with tags , , , , , , , , on August 13, 2019 by xi'an

At ICML last year, Ciwan Ceylan and Michael Gutmann presented a new version of noise constrative estimation to deal with intractable constants. While noise contrastive estimation relies upon a second independent sample to contrast with the observed sample, this approach uses instead a perturbed or noisy version of the original sample, for instance a Normal generation centred at the original datapoint. And eliminates the annoying constant by breaking the (original and noisy) samples into two groups. The probability to belong to one group or the other then does not depend on the constant, which is a very effective trick. And can be optimised with respect to the parameters of the model of interest. Recovering the score matching function of Hyvärinen (2005). While this is in line with earlier papers by Gutmann and Hyvärinen, this line of reasoning (starting with Charlie Geyer’s logistic regression) never ceases to amaze me!

visualising bias and unbiasedness

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , on April 29, 2019 by xi'an

A question on X validated led me to wonder at the point made by Christopher Bishop in his Pattern Recognition and Machine Learning book about the MLE of the Normal variance being biased. As it is illustrated by the above graph that opposes the true and green distribution of the data (made of two points) against the estimated and red distribution. While it is true that the MLE under-estimates the variance on average, the pictures are cartoonist caricatures in their deviance permanence across three replicas. When looking at 10⁵ replicas, rather than three, and at samples of size 10, rather than 2, the distinction between using the MLE (left) and the unbiased estimator of σ² (right).

When looking more specifically at the case n=2, the humongous variability of the density estimate completely dwarfs the bias issue:

Even when averaging over all 10⁵ replications, the difference is hard to spot (and both estimations are more dispersed than the truth!):