Archive for CREST

Dorfman’s group testing

Posted in Books, Kids, Statistics, Travel with tags , , , , , , , , on April 9, 2020 by xi'an

A recent note by CREST researchers insists on using group testing to compensate for the shortage of test packages and testing personal, as done in several countries, towards deconfining individuals who are not infected. Or who are exhibiting the right antibodies.  Reminding me of my first entry to the notion, in Feller’s book, of the method implemented by Robert Dorfman to test for syphilis prior to enlisting potential WW II soldiers. I would deem the idea useful for surveys, in identifying the proportion of infected or immunised persons, maybe less for giving the green light to leave one’s house as the logistics of merging the tests while keeping track of every individual could prove impossible.

SMC 2020 [en Madrid]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on January 30, 2020 by xi'an

Palacio Real from Casa del Campo, on Nov. 10, 2011, during a pleasant bike ride through the outskirts of Madrid and along the renovated banks of Rio ManzanaresAn announcement for the incoming SMC 2020 workshop, taking place in Madrid next 27-29 of May! The previous workshops were in Paris in 2015 (at ENSAE-CREST) and Uppsala in 2017.  This workshop is organised by my friends Víctor Elvira and Joaquín Míguez. With very affordable registration fees and an open call for posters. Here are the invited speakers (so far):

Deniz Akyildiz (University of Warwick)
Christophe Andrieu (University of Bristol)
Nicolas Chopin (ENSAE-CREST)
Dan Crisan (Imperial College London)
Jana de Wiljes (University of Potsdam)
Pierre Del Moral (INRIA)
Petar M. Djuric (Stony Brook University)
Randal Douc (TELECOM SudParis)
Arnaud Doucet (University of Oxford)
Ajay Jasra (National University of Singapore)
Nikolas Kantas (Imperial College London)
Simon Maskell (University of Liverpool)
Lawrence Murray (Uber AI)
François Septier (Université Bretagne Sud)
Sumeetpal Singh (University of Cambridge)
Arno Solin (Aalto University)
Matti Vihola (University of Jyväskylä)
Anna Wigren (Uppsala University)

Korean trip

Posted in Mountains, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 24, 2019 by xi'an

A fairly short but exciting trip to Seoul and to the Fall meeting of the Korean Statistical Society there. Plus giving a seminar at Seoul National University, where I stayed and enjoyed its beautiful campus surrounded by hills painted in the flamboyant reds and yellows of trees. Running to the top of Gwanaksan in the early morning, with some scrambling moments, was a fantastic beginning for the day! Although it was quite unintentional Sacha Tsybakov from CREST happened to be another invited speaker at the meeting (along with Regina Liu from Rutgers, whom I was also met in Salzburg two months ago) and we had a nice stroll together on the University of Seoul campus during a break in the sessions, gaining another view of the city from the top of the Bukhasan mountain. The talk I gave there on the asymptotics of ABC happened to be more attended than my tutorial lecture delivered at the beginning of JSM in Denver this summer. I am thus quite grateful to the organisers for their invitation and this opportunity to meet Korean statisticians and to get a glimpse of Korean culture and cuisine!


Christian Robert is giving a talk in Jussieu tomorrow

Posted in Statistics, University life with tags , , , , , , , on September 26, 2019 by xi'an

My namesake Christian (Yann) Robert (CREST) is giving a seminar tomorrow in Jussieu (Université Pierre & Marie Curie, couloir 16-26, salle 209), between 2 and 3, on composite likelihood estimation method for hierarchical Archimedean copulas defined with multivariate compound distributions. Here is the abstract:

We consider the family of hierarchical Archimedean copulas obtained from multivariate exponential mixture distributions through compounding, as introduced by Cossette et al. (2017). We investigate ways of determining the structure of these copulas and estimating their parameters. An agglomerative clustering technique based on the matrix of Spearman’s rhos, combined with a bootstrap procedure, is used to identify the tree structure. Parameters are estimated through a top-down composite likelihood. The validity of the approach is illustrated through two simulation studies in which the procedure is explained step by step. The composite likelihood method is also compared to the full likelihood method in a simple case where the latter is computable.

on anonymisation

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on August 2, 2019 by xi'an

An article in the New York Times covering a recent publication in Nature Communications on the ability to identify 99.98% of Americans from almost any dataset with fifteen covariates. And mentioning the French approach of INSEE, more precisely CASD (a branch of GENES, as ENSAE and CREST to which I am affiliated), where my friend Antoine worked for a few years, and whose approach is to vet researchers who want access to non-anonymised data, by creating local working environments on the CASD machines  so that data does not leave the site. The approach is to provide the researcher with a dedicated interface, which “enables access remotely to a secure infrastructure where confidential data is safe from harm”. It further delivers reproducibility certificates for publications, a point apparently missed by the New York Times which advances the lack of reproducibility as a drawback of the method. It also mentions the possibility of doing cryptographic data analysis, again missing the finer details with a lame objection.

“Our paper shows how the likelihood of a specific individual to have been correctly re-identified can be estimated with high accuracy even when the anonymized dataset is heavily incomplete.”

The Nature paper is actually about the probability for an individual to be uniquely identified from the given dataset, which somewhat different from the NYT headlines. Using a copula for the distribution of the covariates. And assessing the model with a mean square error evaluation when what matters are false positives and false negatives. Note that the model need be trained for each new dataset, which reduces the appeal of the claim, especially when considering that individuals tagged as uniquely identified about 6% are not. The statistic of 99.98% posted in the NYT is actually a count on a specific dataset,  the 5% Public Use Microdata Sample files, and Massachusetts residents, and not a general statistic [which would not make much sense!, as I can easily imagine 15 useless covariates] or prediction from the authors’ model. And a wee bit anticlimactic.