Archive for MASH

MASH in Le Monde

Posted in Statistics with tags , , , , , , , , on January 25, 2019 by xi'an

done! [#2]

Posted in Kids, Statistics, University life with tags , , , , , , , , , on January 21, 2016 by xi'an

exosPhew! I just finished my enormous pile of homeworks for the computational statistics course… This massive pile is due to an unexpected number of students registering for the Data Science Master at ENSAE and Paris-Dauphine. As I was not aware of this surge, I kept to my practice of asking students to hand back solved exercises from Monte Carlo Statistical Methods at the beginning of each class. And could not change the rules of the game once the course had started! Next year, I’ll make sure to get some backup for grading those exercises. Or go for group projects instead…

methods for quantifying conflict casualties in Syria

Posted in Books, Statistics, University life with tags , , , , , , , , , , on November 3, 2014 by xi'an

On Monday November 17, 11am, Amphi 10, Université Paris-Dauphine,  Rebecca Steorts from CMU will give a talk at the GT Statistique et imagerie seminar:

Information about social entities is often spread across multiple large databases, each degraded by noise, and without unique identifiers shared across databases.Entity resolution—reconstructing the actual entities and their attributes—is essential to using big data and is challenging not only for inference but also for computation.

In this talk, I motivate entity resolution by the current conflict in Syria. It has been tremendously well documented, however, we still do not know how many people have been killed from conflict-related violence. We describe a novel approach towards estimating death counts in Syria and challenges that are unique to this database. We first introduce computational speed-ups to avoid all-to-all record comparisons based upon locality-sensitive hashing from the computer science literature. We then introduce a novel approach to entity resolution by discovering a bipartite graph, which links manifest records to a common set of latent entities. Our model quantifies the uncertainty in the inference and propagates this uncertainty into subsequent analyses. Finally, we speak to the success and challenges of solving a problem that is at the forefront of national headlines and news.

This is joint work with Rob Hall (Etsy), Steve Fienberg (CMU), and Anshu Shrivastava (Cornell University).

[Note that Rebecca will visit the maths department in Paris-Dauphine for two weeks and give a short course in our data science Master on data confidentiality, privacy and statistical disclosure (syllabus).]