Archive for Statistics

ERC descriptors

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on November 9, 2020 by xi'an

Here are the descriptors (or keywords) validated by the (European Research Council) ERC for submitting grant proposal. The recent addition of PE1_15 in the Mathematics panel should help when submitting more methodological projects:

PE1_14 Mathematical statistics
PE1_15 Generic statistical methodology and modelling
PE1_19 Scientific computing and data processing

even though other panels could prove equally suited for some, as in Computer Science and Informatics,

PE6_7 Artificial intelligence, intelligent systems, natural language processing
PE6_10 Web and information systems, data management systems, information retrieval and digital libraries, data fusion
PE6_11 Machine learning, statistical data processing and applications using signal processing (e.g. speech, image, video)
PE6_12 Scientific computing, simulation and modelling tools
PE6_13 Bioinformatics, bio-inspired computing, and natural computing

in Systems and Communication Engineering,

PE7_7 Signal processing

in Integrative Biology,

LS2_11 Bioinformatics and computational biology
LS2_12 Biostatistics

in Prevention,Diagnosis and Treatment of Human Diseases,

LS7_1 Medical imaging for prevention, diagnosis and monitoring of diseases
LS7_2 Medical technologies and tools (including genetic tools and biomarkers) for prevention, diagnosis, monitoring and treatment of diseases

and in Social Sciences and Humanities,

SH1_6 Econometrics; operations research
SH4_9 Theoretical linguistics; computational linguistics

frontier of simulation-based inference

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on June 11, 2020 by xi'an

“This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, `The Science of Deep Learning,’ held March 13–14, 2019, at the National Academy of Sciences in Washington, DC.”

A paper by Kyle Cranmer, Johann Brehmer, and Gilles Louppe just appeared in PNAS on the frontier of simulation-based inference. Sounding more like a tribune than a research paper producing new input. Or at least like a review. Providing a quick introduction to simulators, inference, ABC. Stating the shortcomings of simulation-based inference as three-folded:

  1. costly, since required a large number of simulated samples
  2. loosing information through the use of insufficient summary statistics or poor non-parametric approximations of the sampling density.
  3. wasteful as requiring new computational efforts for new datasets, primarily for ABC as learning the likelihood function (as a function of both the parameter θ and the data x) is only done once.

And the difficulties increase with the dimension of the data. While the points made above are correct, I want to note that ideally ABC (and Bayesian inference as a whole) only depends on a single dimension observation, which is the likelihood value. Or more practically that it only depends on the distance from the observed data to the simulated data. (Possibly the Wasserstein distance between the cdfs.) And that, somewhat unrealistically, that ABC could store the reference table once for all. Point 3 can also be debated in that the effort of learning an approximation can only be amortized when exactly the same model is re-employed with new data, which is likely in industrial applications but less in scientific investigations, I would think. About point 2, the paper misses part of the ABC literature on selecting summary statistics, e.g., the culling afforded by random forests ABC, or the earlier use of the score function in Martin et al. (2019).

The paper then makes a case for using machine-, active-, and deep-learning advances to overcome those blocks. Recouping other recent publications and talks (like Dennis on One World ABC’minar!). Once again presenting machine-learning techniques such as normalizing flows as more efficient than traditional non-parametric estimators. Of which I remain unconvinced without deeper arguments [than the repeated mention of powerful machine-learning techniques] on the convergence rates of these estimators (rather than extolling the super-powers of neural nets).

“A classifier is trained using supervised learning to discriminate two sets of data, although in this case both sets come from the simulator and are generated for different parameter points θ⁰ and θ¹. The classifier output function can be converted into an approximation of the likelihood ratio between θ⁰ and θ¹ (…) learning the likelihood or posterior is an unsupervised learning problem, whereas estimating the likelihood ratio through a classifier is an example of supervised learning and often a simpler task.”

The above comment is highly connected to the approach set by Geyer in 1994 and expanded in Gutmann and Hyvärinen in 2012. Interestingly, at least from my narrow statistician viewpoint!, the discussion about using these different types of approximation to the likelihood and hence to the resulting Bayesian inference never engages into a quantification of the approximation or even broaches upon the potential for inconsistent inference unlocked by using fake likelihoods. While insisting on the information loss brought by using summary statistics.

“Can the outcome be trusted in the presence of imperfections such as limited sample size, insufficient network capacity, or inefficient optimization?”

Interestingly [the more because the paper is classified as statistics] the above shows that the statistical question is set instead in terms of numerical error(s). With proposals to address it ranging from (unrealistic) parametric bootstrap to some forms of GANs.

un des aspects surprenants des analyses et des commentaires sur l’épidémie de Covid-19 est l’absence de la statistique

Posted in Statistics, University life with tags , , , , , , , on May 6, 2020 by xi'an

From one French demographer (INED) in Le Monde [my translation], with a clustering of French departments into three classes [the figures on the above map are the lags after the first death in Haut-Rhin]:

One of the surprising aspects of the analyses and commentaries on the Covid-19 epidemic is the absence of statistics. Every evening, however, we are bombarded with figures, and many sites, from Public Health France (SpF) to Johns-Hopkins University (Maryland), abound in data.

But a number carries a meaning only in reference to other figures. This is where the real statistics start. However, apart from comparing the number of contagions and deaths by country and date, little has been learned from the data, which could provide useful information on the nature and progression of the epidemic (…)

We can see that the diversity of close contacts is one of the keys to the evolution of the epidemic. Instead of reasoning on abstract coefficients such as the famous average number R⁰ of contagions per person, we should be able to delve into the details of these contagions. We see here that traffic axes, institutions and housing probably occupy a strategic position towards an explanation.

This analysis is inevitably limited to the nature of the data and their possible faults. It would be useful to collect more detailed information on the nature of the contacts of each new case of contagion and to analyze it, or even to carry out random surveys with Covid-19 test, in a word, to make the statistics.

assistant/associate professor position in statistics/machine-learning at ENSAE

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on March 10, 2020 by xi'an

ENSAE (my Alma Mater) is opening a new position for next semester in statistics or/and machine-learning. At the Assistant Professor level, the position is for an initial three-year term, renewable for another three years, before the tenure evaluation. The school is located on the Université Paris-Saclay campus, only teaches at the Master and PhD levels, and the deadline for application is 31 March 2020. Details and contacts on the call page.

summer internships at Warwick

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , on December 16, 2019 by xi'an