Statistics versus Data Science [or not]

Last week a colleague from Warwick forwarded us a short argumentation by Donald Macnaughton (a “Toronto-based statistician”) about switching the name of our field from Statistics to Data Science. This is not the first time I hear of this proposal and this is not the first time I express my strong disagreement with it! Here are the naughtonian arguments

  1. Statistics is (at least in the English language) endowed with several meanings from the compilation of numbers out of a series of observations to the field, to the procedures proposed by the field. This is argued to be confusing for laypeople. And missing the connection with data at the core of our field. As well as the indication that statistics gathers information from the data. Data science seems to convey both ideas… But it is equally vague in that most scientific fields if not all rely on data and observations and the structure exploitation of such data. Actually a lot of so-called “data-scientists” have specialised in the analysis of data from their original field, without voluntarily embarking upon a career of data-scientist. And not necessarily acquiring the proper tools for incorporating uncertainty quantification (aka statistics!).
  2. Statistics sounds old-fashioned and “old-guard” and “inward-looking” and unattractive to young talents, while they flock to Data Science programs. Which is true [that they flock] but does not mean we [as a field] must flock there as well. In five or ten years, who can tell this attraction of data science(s) will still be that strong. We already had to switch our Master names to Data Science or the like, this is surely more than enough.
  3. Data science is encompassing other areas of science, like computer science and operation research, but this is not an issue both in terms of potential collaborations and gaining the upper ground as a “key part” in the field. Which is more wishful thinking than a certainty, given the existing difficulties in being recognised as a major actor in data analysis. (As for instance in a recent grant evaluation in “Big Data” where the evaluation committee involved no statistician. And where we got rejected.)

Children of Time [book review]

I came by this book in the common room of the mathematics department of the University of Warwick, which I visit regularly during my stays there, for it enjoys a book sharing box where I leave the books I’ve read (and do not want to carry back to Paris) and where I check for potential catches… One of these books was Tchaikovsky’s children of time, a great space-opera novel à la Arthur C Clarke, which got the 2016 Arthur C Clarke award, deservedly so (even though I very much enjoyed the long way to a small angry planet, Tchaikosky’s book is much more of an epic cliffhanger where the survival of an entire race is at stake). The children of time are indeed the last remnants of the human race, surviving in an artificial sleep aboard an ancient spaceship that irremediably deteriorates. Until there is no solution but landing on a terraformed planet created eons ago. And defended by an AI spanned (or spammed) by the scientist in charge of the terra-formation, who created a virus that speeds up evolution, with unintended consequences. Given that the strength of the book relies on these consequences, I cannot get into much details about the alternative pathway to technology (incl. artificial intelligence) followed by the inhabitants of this new world, and even less about the conclusive chapters that make up for a rather slow progression towards this final confrontation. An admirable and deep book I will most likely bring back to the common room on my next trip to Warwick! (As an aside I wonder if the title was chosen in connection with Goya’s picture of Chronus [Time] devouring his children…)

Barker at the Bernoulli factory

Yesterday, Flavio Gonçalves, Krzysztof Latuszýnski, and Gareth Roberts (Warwick) arXived a paper on Barker’s algorithm for Bayesian inference with intractable likelihoods.

“…roughly speaking Barker’s method is at worst half as good as Metropolis-Hastings.”

Barker’s acceptance probability (1965) is a smooth if less efficient version of Metropolis-Hastings. (Barker wrote his thesis in Adelaide, in the Mathematical Physics department. Most likely, he never interacted with Ronald Fisher, who died there in 1962) This smoothness is exploited by devising a Bernoulli factory consisting in a 2-coin algorithm that manages to simulate the Bernoulli variable associated with the Barker probability, from a coin that can simulate Bernoulli’s with probabilities proportional to [bounded] π(θ). For instance, using a bounded unbiased estimator of the target. And another coin that simulates another Bernoulli on a remainder term. Assuming the bound on the estimate of π(θ) is known [or part of the remainder term]. This is a neat result in that it expands the range of pseudo-marginal methods (and resuscitates Barker’s formula from oblivion!). The paper includes an illustration in the case of the far-from-toyish Wright-Fisher diffusion. [Making Fisher and Barker meeting, in the end!]

the HMC algorithm meets the exchange algorithm

Julien Stoehr (now in Dublin, soon to join us as a new faculty in Paris-Dauphine!), Alan Benson and Nial Friel (both at UCD) arXived last week a paper entitled Noisy HMC for doubly-intractable distributions. Which considers solutions for adapting Hamiltonian Monte Carlo to target densities that involve a missing constant. In the sense of our workshop last year in Warwick. And in the theme pursued by Nial in the past years. The notion is thus to tackle a density π(θ)∞exp(V(X|θ)/Z(θ) when Z(θ) is intractable. In that case the gradient of log Z(θ) can be estimated as the expectation of the gradient of V(X|θ) [as a standard exponential family identity]. And the ratio of the Z(θ)’s appearing in the Metropolis ratio can be derived by Iain Murray’s exchange algorithm, based on simulations from the sampling distribution attached to the parameter in the denominator.

The resulting algorithm proposed by the authors thus uses N simulations of auxiliary variables at each step þ of the leapfrog part, towards an approximation of the gradient term, plus another N simulations for approximating the ratio of the normalising constants Z(θ)/Z(θ’). While justified from an importance sampling perspective, this approximation is quite poor when θ and θ’ differ. A better solution [as shown in the paper] is to take advantage of all leapfrog steps and of associated auxiliary simulations to build a telescopic product of ratios where the parameter values θ and θ’ are much closer. The main difficulty is in drawing a comparison with the exchange algorithm, since the noisy HMC version is computationally more demanding. (A secondary difficulty is in having an approximate algorithm that no longer leaves the target density stationary.)

Bayes Comp 2018

After a rather extended wait, I learned today of the dates of the next MCMski conference, now called Bayes Comp, in Barcelona, Spain, March 26-29, next year (2018). With a cool webpage! (While the ski termination has been removed from the conference name, there are ski resorts located not too far from Barcelona, in the Pyrenees.) Just unfortunate that it happens at the same dates as the ENAR 2018 meeting. (And with the Gregynog Statistical Conference!)

air static

[On an Air France flight for Birmingham, two young French students apparently studying in Warwick kept blathering the entire time, with an utter lack of concern for their surroundings. Note: Les Marseillais is a particularly idiotic reality show on French TV.]

  •  …j’ai arrêté de regarder les Marseillais, c’est même pas conscient, tu vois…
  • …grave, c’est sûr, moi aussi j’ai arrêté, j’avais trop d’épisodes à rattraper…

Gregynog #2 [jatp]

