Archive for crowd-sourcing

crowdsourcing, data science & machine learning to measure violence & abuse against women on twitter

Posted in Books, Statistics, University life with tags , , , , , , , , , on January 3, 2019 by xi'an

Amnesty International just released on December 18 a study on abuse and harassment on twitter account of female politicians and journalists in the US and the UK. Realised through the collaboration of thousands of crowdsourced volunteers labeling  tweets from the database and the machine-learning expertise of the London branch of ElementAI, branch driven by my friend Julien Cornebise with the main purpose of producing AI for good (as he explained at the recent Bayes for good workshop). Including the development of an ML tool to detect abusive tweets, called Troll Patrol [which pun side is clear in French!]. The amount of abuse exposed by this study and the possibility to train AIs to spot [some of the] abuse on line are both arguments that support Amnesty International call for the accountability of social media companies like twitter on abuse and violence propagated through their platform. (Methodology is also made available there.)

AISTATS 2016 [#2]

Posted in Kids, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , on May 13, 2016 by xi'an

The second and third days of AISTATS 2016 passed like a blur, with not even the opportunity to write my impressions in real time! Maybe long tapa breaks are mostly to blame for this… In any case, we had two further exciting plenary talks about privacy-preserving data analysis by Kamalika Chaudhuri and crowdsourcing and machine learning by Adam Tauman Kalai. The talk by Kamalika was covering recent results by Kamalika and coauthors about optimal privacy preservation in classification and a generalisation to correlated data, with the neat notion of a Markov Quilt.  Other talks that same day also dwelt on this privacy issue, but I could not be . The talk by Adam was full of fun illustrations on humans training learning systems (with the unsolved difficulty of those humans deliberately mis-training the system, as exhibited recently by the short-lived Microsoft Tay experiment).

Both poster sessions were equally exciting, with the addition of MLSS student posters on the final day. Among many, I particularly enjoyed Iain Murray’s pseudo-marginal slice sampling, David Duvenaud’s fairly intriguing use of early stopping for non-parametric inference,  Garrett Bernstein’s work on aggregated Markov chains, Ye Wang’s scalable geometric density estimation [with a special bonus for his typo on the University of Turing, instead of Torino], Gemma Moran’s and Chengtao Li’s posters on determinantal processes, and Matej Balog’s Mondrian forests with a Laplace kernel [envisioning potential applications for ABC]. Again, just to mention a few…

The participants [incl. myself] also took one evening off to visit a sherry winery in Jerez, with a well-practiced spiel on the story of the company, with some building designed by Gutave Eiffel, and with a wine-tasting session. As I personally find this type of brandy too strong in alcohol, I am not a big fan of sherry but it was nonetheless an amusing trip! With no visible after-effects the next morning, since the audience was as large as usual for Adam’s talk [although I did not cross a machine-learning soul on my 6am run…]

In short, I enjoyed very much AISTATS 2016 and remain deeply impressed by the efficiency of the selection process and the amount of involvement of the actors of this selection, as mentioned earlier on the ‘Og. Kudos!