Archive for data privacy

One World ABC seminar [season 2]

Posted in Books, Statistics, University life with tags , , , , , , on March 23, 2021 by xi'an

The One World ABC seminar will resume its talks on ABC methods with a talk on Thursday, 25 March, 12:30CET, by Mijung Park, from the Max Planck Institute for Intelligent Systems, on the exciting topic of producing differential privacy by ABC. (Talks will take place on a monthly basis.)

Big Bayes goes South

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on December 5, 2018 by xi'an

At the Big [Data] Bayes conference this week [which I found quite exciting despite a few last minute cancellations by speakers] there were a lot of clustering talks including the ones by Amy Herring (Duke), using a notion of centering that should soon appear on arXiv. By Peter Müller (UT, Austin) towards handling large datasets. Based on a predictive recursion that takes one value at a time, unsurprisingly similar to the update of Dirichlet process mixtures. (Inspired by a 1998 paper by Michael Newton and co-authors.) The recursion doubles in size at each observation, requiring culling of negligible components. Order matters? Links with Malsiner-Walli et al. (2017) mixtures of mixtures. Also talks by Antonio Lijoi and Igor Pruenster (Boconni Milano) on completely random measures that are used in creating clusters. And by Sylvia Frühwirth-Schnatter (WU Wien) on creating clusters for the Austrian labor market of the impact of company closure. And by Gregor Kastner (WU Wien) on multivariate factor stochastic models, with a video of a large covariance matrix evolving over time and catching economic crises. And by David Dunson (Duke) on distance clustering. Reflecting like myself on the definitely ill-defined nature of the [clustering] object. As the sample size increases, spurious clusters appear. (Which reminded me of a disagreement I had had with David McKay at an ICMS conference on mixtures twenty years ago.) Making me realise I missed the recent JASA paper by Miller and Dunson on that perspective.

Some further snapshots (with short comments visible by hovering on the picture) of a very high quality meeting [says one of the organisers!]. Following suggestions from several participants, it would be great to hold another meeting at CIRM in a near future. Continue reading

the DeepMind debacle

Posted in Books, Statistics, Travel with tags , , , , , , , , on August 19, 2017 by xi'an

“I hope for a world where data is at the heart of understanding and decision making. To achieve this we need better public dialogue.” Hetan Shah

As I was reading one of the Nature issues I brought on vacations, while the rain was falling on an aborted hiking day on the fringes of Monte Rosa, I came across a 20 July tribune by Hetan Shah, executive director of the RSS. A rare occurrence of a statistician’s perspective in Nature. The event prompting this column is the ruling against the Royal Free London hospital group providing patient data to DeepMind for predicting kidney. Without the patients’ agreement. And with enough information to identify the patients. The issues raised by Hetan Shah are that data transfers should become open, and that they should be commensurate in volume and details to the intended goals. And that public approval should be seeked. While I know nothing about this specific case, I find the article overly critical of DeepMind, which interest in health related problems is certainly not pure and disinterested but nonetheless can contribute advances in (personalised) care and prevention through its expertise in machine learning. (Disclaimer: I have neither connection nor conflict with the company!) And I do not see exactly how public approval or dialogue can help in making progress in handling data, unless I am mistaken in my understanding of “the public”. The article mentions the launch of a UK project on data ethics, involving several [public] institutions like the RSS: this is certainly commandable and may improve personal data is handled by companies, but I would not call this conglomerate representative of the public, which most likely does not really trust these institutions either…


Posted in R, Statistics, University life with tags , , , , , , , on June 4, 2017 by xi'an

A few weeks ago and then some, I [as occasional blogger!] got contacted by to write a piece on this data-sharing platform. I then went and checked what this was all about, having the vague impression this was a platform where I could store and tun R codes, besides dropping collective projects, but from what I quickly read, it sounds more like being able to run R scripts from one’s machine using data and code stored on But after reading just one more blog entry I finally understood it is also possible to run R, SQL, NotebookJS (and LaTeX) directly on that platform, without downloading code or data to one’s machine. Which makes it a definitive plus with this site, as users can experiment with no transfer to their computer. Hence on a larger variety of platforms. While personally I do not [yet?] see how to use it for my research or [limited] teaching, it seems like an [yet another] interesting exploration of the positive uses of Internet to collaborate and communicate on scientific issues! With no opinion on privacy and data protection offered by the site, of course.

Steve Fienberg’ obituary in Nature

Posted in Statistics with tags , , , , , , , , on March 10, 2017 by xi'an

“Stephen Fienberg was the ultimate public statistician.”

Robin Mejia from CMU published in the 23 Feb issue of Nature an obituary of Steve Fienberg that sums up beautifully Steve’s contributions to science and academia. I like the above quote very much, as indeed Steve was definitely involved in public policies, towards making those more rational and fair. I remember the time he came to Paris-Dauphine to give a seminar and talk on his assessment in a NAS committee on the polygraph (and my surprise at it being used at all in the US and even worse in judiciary issues). Similarly, I remember his involvement in making the US Census based on surveys rather than on an illusory exhaustive coverage of the entire US population. Including a paper in Nature about the importance of surveys. And his massive contributions to preserving privacy in surveys and databases, an issue in which he was a precursor (even though my colleagues at the French Census Bureau did not catch the opportunity when he spent a sabbatical in Paris in 2004). While it is such a sad circumstance that lead to statistics getting a rare entry in Nature, I am glad that Steve can also be remembered that way.