Archive for Université de Montpellier

the new DIYABC-RF

Posted in Books, pictures, R, Statistics, Wines with tags , , , , , , , , , , , , , , , , on April 15, 2021 by xi'an

My friends and co-authors from Montpellier have released last month the third version of the DIYABC software, DIYABC-RF, which includes and promotes the use of random forests for parameter inference and model selection, in connection with Louis Raynal’s thesis. Intended as the earlier versions of DIYABC for population genetic applications. Bienvenue!!!

The software DIYABC Random Forest (hereafter DIYABC-RF) v1.0 is composed of three parts: the dataset simulator, the Random Forest inference engine and the graphical user interface. The whole is packaged as a standalone and user-friendly graphical application named DIYABC-RF GUI and available at https://diyabc.github.io. The different developer and user manuals for each component of the software are available on the same website. DIYABC-RF is a multithreaded software on three operating systems: GNU/Linux, Microsoft Windows and MacOS. One can use the program can be used through a modern and user-friendly graphical interface designed as an R shiny application (Chang et al. 2019). For a fluid and simplified user experience, this interface is available through a standalone application, which does not require installing R or any dependencies and hence can be used independently. The application is also implemented in an R package providing a standard shiny web application (with the same graphical interface) that can be run locally as any shiny application, or hosted as a web service to provide a DIYABC-RF server for multiple users.

gone South [jatp]

Posted in Mountains, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , on March 27, 2021 by xi'an

flow contrastive estimation

Posted in Books, Statistics with tags , , , , , , , , on March 15, 2021 by xi'an

On the flight back from Montpellier, last week, I read a 2019 paper by Gao et al. revisiting the MLE estimation of a parametric family parameter when the normalising constant Z=Z(θ) is unknown. Via noise-contrastive estimation à la Guttman & Hyvärinnen (or à la Charlie Geyer). Treating the normalising constant Z as an extra parameter (as in Kong et al.) and the classification probability as an objective function and calling it a likelihood, which it is not in my opinion as (i) the allocation to the groups is not random and (ii) the original density of the actual observations does not appear in the so-called likelihood.

“When q appears on the right of KL-divergence [against p],  it is forced to cover most of the modes of p, When q appears on the left of KL-divergence, it tends to chase the major modes of p while ignoring the minor modes.”

The flow in the title indicates that the contrastive distribution q is estimated by a flow-based estimator, namely the transform of a basic noise distribution via easily invertible and differentiable transforms, for instance with lower triangular Jacobians. This flow is also estimated directly from the data but the authors complain this estimation is not good enough for noise contrastive estimation and suggest instead resorting to a GAN version where the classification log-probability is maximised in the model parameters and minimsed in the flow parameters. Except that I feel it misses the true likelihood part. In other words, why on Hyperion would estimating all θ, Z=Z(θ), and α at once improve the estimation of Z?

The other aspect that puzzles me is that (12) uses integrated classification probabilities (with the unknown Z as extra parameter), rather than conditioning on the data, Bayes-like. (The difference between (12) and GAN is that here the discriminator function is constrained.) Esp. when the first expectation is replaced with its empirical version.

Monsieur le Président [reposted]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on April 11, 2020 by xi'an

Let us carry out screening campaigns on representative samples of population!

Mr President of the Republic, as you rightly indicated, we are at war and everything must be done to combat the spread of CODIV-19. You had the wisdom to surround yourself with a Scientific Council and an Analysis, Research and Expertise Committee, both competent, and, as you know, applied mathematicians, statisticians have a role to play in this battle. Yes, to predict the evolution of the epidemic, mathematical models are used at different scales. This allows us estimate the number of people infected in the coming weeks and months. We are at war and these predictions are essential to the development of the best control strategy. They inform political decisions. This is especially with the help of these items of information that the confinement of the French population has been decided and renewed.

Mr President we are at war and these predictions must be the most robust possible. The more precise they are, the better the decisions they will guide. Mathematical models include a number of unknown parameters whose values ​​should be set based on expert advice or data. These include the transmission rate, incubation time, contagion time, and, of course, to initialize dynamic mathematical models, the number of covered individuals. To enjoy more reliable predictions, it is necessary to better estimate such crucial quantities. The proportion of healthy carriers appears to be a particularly critical parameter.

Mr President, we are at war and we must assess the proportions of healthy carriers by geographic areas. We do not currently have the means to implement massive screenings, but we can carry out surveys. This means, for a well-defined geographic area, to run biological tests on samples of individuals that are drawn at random and are representative of the total population of the area. Such data would come to supplement those already available and would considerably reduce the uncertainty in model predictions.

Mr. President, we are at war, let us give ourselves the means to fight effectively against this scourge. Thanks to a significant effort, the number of individuals that can be tested daily increases significantly, let’s devote some of these available tests to samples representative. For each individual drawn at random, we will perform a nasal swab, a blood test, let us collect clinical data and other items of information on its follow-up barriers. This would provide important information on the percentage of immunized French people. This data would open the possibility to feed mathematical models wisely, and hence to make informed decisions about the different strategies of deconfinement.

Mr. President, we are at war. This strategy, which could at first be deployed only in the most affected sectors, is, we believe, essential. It is doable: designing the survey and determining a representative sample is not an issue, going to the homes of the people in the sample, towards taking samples and having them fill out a questionnaire is also perfectly achievable if we give ourselves the means to do so. You only have to decide that a few of the available PCR tests and serological tests will be devoted to these statistical studies. In Paris and in the Grand Est, for instance, a mere few thousand tests on a representative population of individuals properly selected could better assess the situation and help in taking informed decisions.

Mr. President, a proposal to this effect has been presented to the Scientific Council and to the Analysis, Research and Expertise Committee that you have set up by a group of mathematicians at École Polytechnique with Professor Josselin Garnier at their head. You will realise by reading this tribune that the statistician that I am does support very strongly. I am in no way disputing the competence of the councils which support you but you have to act quickly and, I repeat, only dedicate a few thousand tests to statistics studies. Emergency is everywhere, assistance to the patients, to people in intensive care, must of course be the priority, but let us attempt to anticipate as well . We do not have the means to massively test the entire population, let us run polls.

Jean-Michel Marin
Professeur à l’Université de Montpellier
Président de la Société Française de Statistique
Directeur de l’Institut Montpelliérain Alexander Grothendieck
Vice-Doyen de la Faculté des Sciences de Montpellier

another mirror of ABC in Gre[e]noble

Posted in Statistics with tags , , , , , , , , , , on March 3, 2020 by xi'an

There will now be a second mirror workshop of ABC in Grenoble. Taking place at the Université de Montpellier, more precisely at the Alexander Grothendieck Montpellier Institute, Building 9, room 430 (4th floor), Triolet Campus. It is organised by my friend Jean-Michel Marin. Great to see a mirror at one of the major breeding places of ABC, where I personally heard of ABC for the first time and met several of the main A[B]Ctors..! The dates are 19-20 March, with talks transmitted from 9am to 5am [GMT+1]. Since the video connection can accommodate 1918 more mirrors, if anyone else is interested in organising another mirror, please contact me for technical details.