Archive for big data

if then [reading a book self-review]

Posted in Statistics with tags , , , , , , , , , , , , , on October 26, 2020 by xi'an

Nature of 17 September 2020 has a somewhat surprising comment section where an author, Jill Lepore from Harvard University, actually summarises her own book, If Then: How the Simulmatics Corporation invented the Future. This book is the (hi)story of a precursor of Big Data Analytics, Simulmatics, which used as early as 1959 clustering and simulation to predict election results and if possible figure out discriminant variables. Which apparently contributed to John F. Kennedy’ s victory over Richard Nixon in 1960. Rather than admiring the analytic abilities of such precursors (!), the author is blaming them for election interference. A criticism that could apply to any kind of polling, properly or improperly conducted. The article also describes how Simulmatics went into advertising, econometrics and counter-insurgency, vainly trying to predict the occurence and location of riots (at home) and revolutions (abroad). And argues in a all-encompassing critique against any form of data-analytics applied to human behaviour. And praises the wisdom of 1968 protesters over current Silicon Valley researchers (whose bosses may have been among these 1968 protesters!)… (Stressing again that my comments come from reading and reacting to the above Nature article, not the book itself!)

Calling Bullshit: The Art of Scepticism in a Data‑Driven World [EJ’s book review]

Posted in Books, Statistics with tags , , , , , on August 26, 2020 by xi'an

“…this book will train readers to be statistically savvy at a time when immunity to misinformation is essential: not just for the survival of liberal democracy, as the authors assert, but for survival itself.Perhaps a crash course on bullshit detection should be a mandatory part of the school curriculum.”

In the latest issue of Nature, EJ Wagenmaker has written a book review of the book Calling Bullshit, by  Carl Bergstrom and Jevin West. Book written out of a course taught by the authors at the University of Washington during Spring Quarter 2017 and aimed at teaching students how to debunk bullshit, that is, misleading exploitation of statistics and machine learning. And subsequently turned into a book. Which I have not read. In his overall positive review EJ regrets the poor data visualisation scholarship of the authors, who could have demonstrated and supported the opportunity for a visual debunking of the original data. And the lack of alternative solutions like Bayesian analysis to counteract p-fishing. Of course, the need for debunking and exposing statistically sounding misinformation has never been so present.

Expectation Propagation as a Way of Life on-line

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , on March 18, 2020 by xi'an

After a rather extended shelf-life, our paper expectation propagation as a way of life: a framework for Bayesian inference on partitioned data which was started when Andrew visited Paris in… 2014!, and to which I only marginally contributed, has now appeared in JMLR! Which happens to be my very first paper in this journal.

the most important statistical ideas of the past 50 years

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

A grand building entrance near the train station in HelsinkiAki and Andrew are celebrating the New Year in advance by composing a list of the most important statistics ideas occurring (roughly) since they were born (or since Fisher died)! Like

  • substitution of computing for mathematical analysis (incl. bootstrap)
  • fitting a model with a large number of parameters, using some regularization procedure to get stable estimates and good predictions (e.g., Gaussian processes, neural networks, generative adversarial networks, variational autoencoders)
  • multilevel or hierarchical modelling (incl. Bayesian inference)
  • advances in statistical algorithms for efficient computing (with a long list of innovations since 1970, including ABC!), pointing out that a large fraction was of the  divide & conquer flavour (in connection with large—if not necessarily Big—data)
  • statistical decision analysis (e.g., Bayesian optimization and reinforcement learning, getting beyond classical experimental design )
  • robustness (under partial specification, misspecification or in the M-open world)
  • EDA à la Tukey and statistical graphics (and R!)
  • causal inference (via counterfactuals)

Now, had I been painfully arm-bent into coming up with such a list, it would have certainly been shorter, for lack of opinion about some of these directions (even the Biometrika deputeditoship has certainly helped in reassessing the popularity of different branches!), and I would have have presumably been biased towards Bayes as well as more mathematical flavours. Hence objecting to the witty comment that “theoretical statistics is the theory of applied statistics”(p.10) and including Ghosal and van der Vaart (2017) as a major reference. Also bemoaning the lack of long-term structure and theoretical support of a branch of the machine-learning literature.

Maybe also more space and analysis could have been spent on “debates remain regarding appropriate use and interpretation of statistical methods” (p.11) in that a major difficulty with the latest in data science is not so much the method(s) as the data on which they are based, which in a large fraction of the cases, is not representative and is poorly if at all corrected for this bias. The “replication crisis” is thus only one (tiny) aspect of the challenge.

Michael dans le Monde [#2]

Posted in Books, pictures, Statistics, University life with tags , , , , on January 5, 2020 by xi'an

A (second) back-page interview of Mike in Le Monde on the limitations of academics towards working with major high tech companies. And fatal attractions that are difficult to resist, given the monetary rewards. As his previous interview, this is quite an interesting read (in French), although it obviously reflects a US perspective rather than French (with the same comment applying to the recent interview of Yann LeCun on France Inter).

“…les chercheurs académiques français, qui sont vraiment très peu payés.”

The first part is a prediction that the GAFAs will not continue hiring (full-time or part-time) academic researchers to keep doing their academic research as the quest for more immediate profits will eventually win over the image produced by these collaborations. But maybe DeepMind is not the best example, as e.g. Amazon seems to be making immediate gains from such collaborations.

“…le modèle économique [de Amazon, Ali Baba, Uber, &tc] cherche à créer des marchés nouveaux avec à la source, on peut l’espérer, de nouveaux emplois.”

One stronger point of disagreement is about the above quote, namely that Uber or Amazon indeed create jobs. As I am uncertain that all jobs creations are worthwhile. Indeed, which kind of freedom there is in working after-hours for a reward that is so much below the minimal wage (in countries where there is a true minimal wage) that the workers [renamed entrepreneurs] are below the poverty line? Similarly, unless there are stronger regulations imposed by states or unions like the EU, it seems difficult to imagine how society as an aggregate of individuals can curb the hegemonic tendencies of the high tech leviathans…?