Archive for CREST

SMC 2020 [en Madrid]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on January 30, 2020 by xi'an

Palacio Real from Casa del Campo, on Nov. 10, 2011, during a pleasant bike ride through the outskirts of Madrid and along the renovated banks of Rio ManzanaresAn announcement for the incoming SMC 2020 workshop, taking place in Madrid next 27-29 of May! The previous workshops were in Paris in 2015 (at ENSAE-CREST) and Uppsala in 2017.  This workshop is organised by my friends Víctor Elvira and Joaquín Míguez. With very affordable registration fees and an open call for posters. Here are the invited speakers (so far):

Deniz Akyildiz (University of Warwick)
Christophe Andrieu (University of Bristol)
Nicolas Chopin (ENSAE-CREST)
Dan Crisan (Imperial College London)
Jana de Wiljes (University of Potsdam)
Pierre Del Moral (INRIA)
Petar M. Djuric (Stony Brook University)
Randal Douc (TELECOM SudParis)
Arnaud Doucet (University of Oxford)
Ajay Jasra (National University of Singapore)
Nikolas Kantas (Imperial College London)
Simon Maskell (University of Liverpool)
Lawrence Murray (Uber AI)
François Septier (Université Bretagne Sud)
Sumeetpal Singh (University of Cambridge)
Arno Solin (Aalto University)
Matti Vihola (University of Jyväskylä)
Anna Wigren (Uppsala University)

Korean trip

Posted in Mountains, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 24, 2019 by xi'an

A fairly short but exciting trip to Seoul and to the Fall meeting of the Korean Statistical Society there. Plus giving a seminar at Seoul National University, where I stayed and enjoyed its beautiful campus surrounded by hills painted in the flamboyant reds and yellows of trees. Running to the top of Gwanaksan in the early morning, with some scrambling moments, was a fantastic beginning for the day! Although it was quite unintentional Sacha Tsybakov from CREST happened to be another invited speaker at the meeting (along with Regina Liu from Rutgers, whom I was also met in Salzburg two months ago) and we had a nice stroll together on the University of Seoul campus during a break in the sessions, gaining another view of the city from the top of the Bukhasan mountain. The talk I gave there on the asymptotics of ABC happened to be more attended than my tutorial lecture delivered at the beginning of JSM in Denver this summer. I am thus quite grateful to the organisers for their invitation and this opportunity to meet Korean statisticians and to get a glimpse of Korean culture and cuisine!

 

Christian Robert is giving a talk in Jussieu tomorrow

Posted in Statistics, University life with tags , , , , , , , on September 26, 2019 by xi'an

My namesake Christian (Yann) Robert (CREST) is giving a seminar tomorrow in Jussieu (Université Pierre & Marie Curie, couloir 16-26, salle 209), between 2 and 3, on composite likelihood estimation method for hierarchical Archimedean copulas defined with multivariate compound distributions. Here is the abstract:

We consider the family of hierarchical Archimedean copulas obtained from multivariate exponential mixture distributions through compounding, as introduced by Cossette et al. (2017). We investigate ways of determining the structure of these copulas and estimating their parameters. An agglomerative clustering technique based on the matrix of Spearman’s rhos, combined with a bootstrap procedure, is used to identify the tree structure. Parameters are estimated through a top-down composite likelihood. The validity of the approach is illustrated through two simulation studies in which the procedure is explained step by step. The composite likelihood method is also compared to the full likelihood method in a simple case where the latter is computable.

on anonymisation

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on August 2, 2019 by xi'an

An article in the New York Times covering a recent publication in Nature Communications on the ability to identify 99.98% of Americans from almost any dataset with fifteen covariates. And mentioning the French approach of INSEE, more precisely CASD (a branch of GENES, as ENSAE and CREST to which I am affiliated), where my friend Antoine worked for a few years, and whose approach is to vet researchers who want access to non-anonymised data, by creating local working environments on the CASD machines  so that data does not leave the site. The approach is to provide the researcher with a dedicated interface, which “enables access remotely to a secure infrastructure where confidential data is safe from harm”. It further delivers reproducibility certificates for publications, a point apparently missed by the New York Times which advances the lack of reproducibility as a drawback of the method. It also mentions the possibility of doing cryptographic data analysis, again missing the finer details with a lame objection.

“Our paper shows how the likelihood of a specific individual to have been correctly re-identified can be estimated with high accuracy even when the anonymized dataset is heavily incomplete.”

The Nature paper is actually about the probability for an individual to be uniquely identified from the given dataset, which somewhat different from the NYT headlines. Using a copula for the distribution of the covariates. And assessing the model with a mean square error evaluation when what matters are false positives and false negatives. Note that the model need be trained for each new dataset, which reduces the appeal of the claim, especially when considering that individuals tagged as uniquely identified about 6% are not. The statistic of 99.98% posted in the NYT is actually a count on a specific dataset,  the 5% Public Use Microdata Sample files, and Massachusetts residents, and not a general statistic [which would not make much sense!, as I can easily imagine 15 useless covariates] or prediction from the authors’ model. And a wee bit anticlimactic.

noise contrastive estimation

Posted in Statistics with tags , , , , , , , , , on July 15, 2019 by xi'an

As I was attending Lionel Riou-Durand’s PhD thesis defence in ENSAE-CREST last week, I had a look at his papers (!). The 2018 noise contrastive paper is written with Nicolas Chopin (both authors share the CREST affiliation with me). Which compares Charlie Geyer’s 1994 bypassing the intractable normalising constant problem by virtue of an artificial logit model with additional simulated data from another distribution ψ.

“Geyer (1994) established the asymptotic properties of the MC-MLE estimates under general conditions; in particular that the x’s are realisations of an ergodic process. This is remarkable, given that most of the theory on M-estimation (i.e.estimation obtained by maximising functions) is restricted to iid data.”

Michael Guttman and Aapo Hyvärinen also use additional simulated data in another likelihood of a logistic classifier, called noise contrastive estimation. Both methods replace the unknown ratio of normalising constants with an unbiased estimate based on the additional simulated data. The major and impressive result in this paper [now published in the Electronic Journal of Statistics] is that the noise contrastive estimation approach always enjoys a smaller variance than Geyer’s solution, at an equivalent computational cost when the actual data observations are iid. And the artificial data simulations ergodic. The difference between both estimators is however negligible against the Monte Carlo error (Theorem 2).

This may be a rather naïve question, but I wonder at the choice of the alternative distribution ψ. With a vague notion that it could be optimised in a GANs perspective. A side result of interest in the paper is to provide a minimal (re)parameterisation of the truncated multivariate Gaussian distribution, if only as an exercise for future exams. Truncated multivariate Gaussian for which the normalising constant is of course unknown.

Ph.D. scholarships at ENSAE ParisTech‐CREST

Posted in Statistics with tags , , , , , , , , on April 2, 2019 by xi'an

ENSAE ParisTech and CREST are currently inviting applications for 3-year PhD scholarships in statistics (and economics, finance, and sociology). There is no constraint of nationality or curriculum, but the supervisor must be from ENSAE (Paris-Saclay) or ENSAI (Rennes-Bruz).  The deadline is May 1, to be sent to Mrs Fanda Traore, at ensae.fr.

Applications should submitted (in French or in English), including :
– Curriculum vitae;
– Statement of research and teaching interests (10 pages);
– a cover letter
– the official transcripts of all higher education institutions from which you get a degree
– recommendation letters from professors, including a letter from the Ph.D. supervisor.

Selected candidates will be most likely interviewed at ENSAE‐CREST.

position in statistics and/or machine learning at ENSAE ParisTech‐CREST

Posted in pictures, University life with tags , , , , , , , on March 28, 2019 by xi'an

ENSAE ParisTech and CREST are currently inviting applications for a position of Assistant or Associate Professor in Statistics or Machine Learning.

The appointment starts in September, 2019, at the earliest. At the level of Assistant Professor, the position is for an initial three-year term renewable for another three years before the tenure evaluation. Salary is competitive according to qualifications. The teaching duties are reduced compared to French university standards. At the time of appointment, knowledge of French is not required but it is expected that the appointee will acquire a workable knowledge of French within a reasonable time.

Candidate Profile

– PhD in Statistics or Machine Learning.
– Outstanding research, including subjects in high-dimensional statistics and machine learning.
– Publications in leading international journals in Statistics or leading outlets in Machine Learning.

Demonstrated ability to teach courses in Mathematics, Statistics and Machine Learning for engineers and to supervise projects in Applied Statistics. The successful candidate is expected to teach at least one course in mathematics, applied mathematics or introductory statistics at the undergraduate level, and one course in the “Data Science, Statistics and Machine Learning”’ specialization track during the third year of ENSAE (Master level).

Applications should submitted (in French or in English) by email to recruitment@ensae.fr :
– Curriculum vitae;
– Statement of research and teaching interests (2-4 pages);
– Names and addresses of three or more individuals willing to provide letters of reference.

Deadline for applications : April 29, 2019.
Selected candidates will be invited to present their work and project at ENSAE‐CREST.