Archive for record linkage

Bayes for good

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on November 27, 2018 by xi'an

A very special weekend workshop on Bayesian techniques used for social good in many different sense (and talks) that we organised with Kerrie Mengersen and Pierre Pudlo at CiRM, Luminy, Marseilles. It started with Rebecca (Beka) Steorts (Duke) explaining [by video from Duke] how the Syrian war deaths were processed to eliminate duplicates, to be continued on Monday at the “Big” conference, Alex Volfonsky (Duke) on a Twitter experiment on the impact of being exposed to adverse opinions as depolarising (not!) or further polarising (yes), turning into network causal analysis. And then Kerrie Mengersen (QUT) on the use of Bayesian networks in ecology, through observational studies she conducted. And the role of neutral statisticians in case of adversarial experts!

Next day, the first talk of David Corlis (Peace-Work), who writes the Stats for Good column in CHANCE and here gave a recruiting spiel for volunteering in good initiatives. Quoting Florence Nightingale as the “first” volunteer. And presenting a broad collection of projects as supports to his recommendations for “doing good”. We then heard [by video] Julien Cornebise from Element AI in London telling of his move out of DeepMind towards investing in social impacting projects through this new startup. Including working with Amnesty International on Darfour village destructions, building evidence from satellite imaging. And crowdsourcing. With an incoming report on the year activities (still under embargo). A most exciting and enthusiastic talk!

Continue reading


Posted in Statistics, University life with tags , , , , , , , on August 24, 2015 by xi'an

Two items of news that reached my mailbox at about the same time: my friends and CMU coauthors Rebecca (Beka) Steorts and Steve Fienberg both received a major award in the past few days. Congrats to both of them!!! At JSM 2015, Steve got the 2015 Jerome Sacks Award for Cross-Disciplinary Research “for a remarkable career devoted to the development and application of statistical methodology to solve problems for the benefit of society, including aspects of human rights, privacy and confidentiality, forensics, survey and census-taking, and more; and for exceptional leadership in a variety of professional and governmental organizations, including in the founding of NISS.” The Award is delivered by the National Institute of Statistical Sciences (NISS) in honour of Jerry Sacks. And Beka has been selected as one of the 35 innovators under 35 for 2015, a list published yearly by the MIT Technology Review. In particular for her record-linkage work on estimating the number of casualties in the Syrian civil war. (Which led the Review to classify her as a humanitarian rather than a visionary, which list includes two other machine learners.) Great!

JSM 2014, Boston [#4]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on August 9, 2014 by xi'an

Last and final day and post at and about JSM 2014! It is very rare that I stay till the last day and it is solely due to family constraints that I attended the very last sessions. It was a bit eerie, walking through the huge structure of the Boston Convention Centre that could easily house several A380 and meeting a few souls dragging a suitcase to the mostly empty rooms… Getting scheduled on the final day of the conference is not the nicest thing and I offer my condolences to all speakers ending up speaking today! Including my former Master student Anne Sabourin.

I first attended the Frontiers of Computer Experiments: Big Data, Calibration, and Validation session with a talk by David Hingdon on the extrapolation limits of computer model, talk that linked very nicely with Stephen Stigler’s Presidential Address and stressed the need for incorporating the often neglected fact that models are not reality. Jared Niemi also presented an approximative way of dealing with large dataset Gaussian process modelling. It was only natural to link this talk with David’s and wonder about the extrapola-bility of the modelling and the risk of over-fitting and the potential for detecting sudden drops in the function.

The major reason why I made the one-hour trip back to the Boston Convention Centre was however theonder about the extrapola-bility of the modelling and the risk of over-fitting and the potential for detecting sudden drops in the function.

The major reason why I made the one-hour trip back to the Boston Convention Centre was however the Human Rights Violations: How Do We Begin Counting the Dead? session. It was both of direct interest to me as I had wondered in the past days about statistically assessing the number of political kidnappings and murders in Eastern Ukraine. And of methodological relevance, as the techniques were connected with capture-recapture and random forests. And of close connections with two speakers who alas could not make it and were replaced by co-authors. The first talk by Samuel Ventura considered ways of accelerating the comparison of entries into multiple lists for identifying unique individuals, with the open methodological question of handling populations of probabilities. As the outcome of random forests. My virtual question related to this talk was why the causes for duplications and errors in the record were completely ignored. At least in the example of the Syrian death, some analysis could be conducted on the reasons for differences in the entries. And maybe a prior model constructed. The second talk by Daniel Manrique-Vallier was about using non-parametric capture-recapture to count the number of dead from several lists. Once again bypassing the use of potential covariates for explaining the differences.  As I noticed a while ago when analysing the population of (police) captured drug addicts in the Greater Paris, the prior modelling has a strong impact on the estimated population. Another point I would have liked to discuss was the repeated argument that Arabic (script?) made the identification of individuals more difficult: my naïve reaction was to wonder whether or not this was due to the absence of fluent Arabic speakers in the team. Who could have further helped to build a model on the potential alternative spellings and derivations of Arabic names. But I maybe missed more subtle difficulties.