Bayes for good

A very special weekend workshop on Bayesian techniques used for social good in many different sense (and talks) that we organised with Kerrie Mengersen and Pierre Pudlo at CiRM, Luminy, Marseilles. It started with Rebecca (Beka) Steorts (Duke) explaining [by video from Duke] how the Syrian war deaths were processed to eliminate duplicates, to be continued on Monday at the “Big” conference, Alex Volfonsky (Duke) on a Twitter experiment on the impact of being exposed to adverse opinions as depolarising (not!) or further polarising (yes), turning into network causal analysis. And then Kerrie Mengersen (QUT) on the use of Bayesian networks in ecology, through observational studies she conducted. And the role of neutral statisticians in case of adversarial experts!

Next day, the first talk of David Corlis (Peace-Work), who writes the Stats for Good column in CHANCE and here gave a recruiting spiel for volunteering in good initiatives. Quoting Florence Nightingale as the “first” volunteer. And presenting a broad collection of projects as supports to his recommendations for “doing good”. We then heard [by video] Julien Cornebise from Element AI in London telling of his move out of DeepMind towards investing in social impacting projects through this new startup. Including working with Amnesty International on Darfour village destructions, building evidence from satellite imaging. And crowdsourcing. With an incoming report on the year activities (still under embargo). A most exciting and enthusiastic talk!

The following two talks were by Andrea Tancredi and Brunero Liseo (La Sapienza Roma) on record linkage, in collaboration with Rebecca Steorts. Using a capture-recapture framework. Connecting with the Biometrika paper of Johndrow, Lum and Dunson I discussed a few months ago. But also using a more advanced clustering algorithm based on Pitman-Yor processes. Again with applications to the Syrian war deaths. (And a fairly convincing point on the difficulty to be “objective” in missing data problems.) Jacinta Holloway (QUT) spoke about using satellite data to support UN and developing countries to target issues related to crops, water management, &tc. Using Bayesian decision trees (BART) to handle missing data on the pictures. Logan Graham and Brody Fox (Oxford) paired to describe their RAIL initiative. Tackling social challenges by teams of PhD students from (very) different fields. With an application to NYC shelter data. (NYC now has a dedicated data analytics department.) Plus many other fascinating projects. Charles Gray (La Trobe) argued about going further towards applicants to reach true usability of statistical and ML methods, solving the toolchain gap. And mentioning unconf in the passing. With a further plea for open source. And refactoring of R code. Ethan Goan (QUT) defended the Bayesian side of deep learning. As providing a much needed measure of variability. Reparameterising neural networks so that they get differentiable and hence use gradients for variational approximation. (Do we really need the variational B part?) And then Andrew Gelman (Columbia) called from New York! Chatting with us about the reproduction crisis. Distinguishing issues from the (obvious) cheater cases. Even preregistration is not a panacea. He further argued against the testing approach as unrealistic and even unethical. With the (great) paradoxical statement that the crisis stands with statistics being asked too little.,

On Sunday, Cody Ross (Max Plank Institute) discussed the issue of racial bias in police shootings, starting with the difficulty of getting data. And on the level of conditioning necessary to assess this bias and the opposite conclusions of another study by Fryer. The later conditioning on encounters between police and potential victims, but encounters being much more likely for black people. (With a predictable link with Simpson’s paradox.) Then Bihan Zhuang (Duke) spoke of her work on Pitman Yor process for entity resolution, getting away from a Uniform prior, which is actually the topic of her undergraduate thesis. And her first presentation at a conference, which proved quite impressive given her most recent entry on this research topic! Em Rushworth (QUT) described the virtual reef diver platform experiment that surveys coral reefs (walking to their complete destruction by temperature increase). Expanding a Dirichlet model to better process the case of absent species.

After the afternoon break where it did not rain, it poured!, Tamara Broderick (MIT) showed how Bayesian inference helped in modelling a microcredit experiment. And more generally addressed the issue of robustness. From prior robustness to the impact of approximations like variational Bayes. Using a linear approximation to the cumulant generating function (conveniently linked with the Kullback-Leibler divergence!) to find a better approximation to the posterior variance. Thanks to automated differentiation. With the drawback (imho )of focusing on the first two moments of the posterior. This tool equally applies to sensitivity to prior modelling and spotting most influential hyperparameters. And to fast X validation. Making linear response a new Swiss army jacknife! Gajendra Vishwakarma concluded with a talk on an analysis of gene expression data.

On Monday morning, Atanu Battacharjee (Tata Institute) presented an analysis on metronomic chemotherapy, about optimal allocation of a cancer chemical depending on patient characteristics. Using different generalised linear models and standard Bayesian estimates. Then Farzana Jahan (QUT) looked at Bayesian meta-analysis for analysing (cancer) spatial maps. (Kerrie and her team were instrumental in bringing the on-line Australian cancer atlas to life.) With Dumouchel’s model that is applied to rank cancer occurrences by local habitation density. (Including a surprising higher occurrence of lung cancer in remote areas rather than cities.) David Corliss (Peace-Work) gave his second talk on capture-recapture, which has been used for a long while towards populations at risk like the famous 1993 prostitution study in Glasgow. (Or my unsuccessful attempt at estimating the number of drug addicts in the Greater Paris.) Briefly discussing Ball’s processing of the Bosnian genocide. And spending more time on his current detection of hate speech on Twitter towards estimating the number of contributors. Antonietta Mira (USI, Lugano) concluded the workshop with a resuscitating talk, about the statistical modelling of cardiac arrests in the Swiss canton of Ticino. With advanced structures (even considering sending drones to drop defibrillators to accidents!) and a dozen associated datasets. Anto and her team looked at optimal allocations of defibrillators. And building risk maps using INLA.

In conclusion, this was a fantastically diverse and exciting workshop, touching on issues that one rarely see in “standard” conferences, about communicating with decision makers and the general public, protecting sources at a higher level of urgency than usual, paying attention at the potentially lethal misuses of the published studies, &tc. The workshops (within the workshop) that we ran every day should produce a chapter of recommendations on conducting Bayes [analysis] for good in the book associated with Kerrie’s Jean Morlet chair.

[As an aside, I was going to add (and Good for Bayes) in the title, but concluded that this pun about I.J. Good’s name was rather misplaced.]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.