no publication without confirmation

“Our proposal is a new type of paper for animal studies (…) that incorporates an independent, statistically rigorous confirmation of a researcher’s central hypothesis.” (p.409)

A comment tribune in Nature of Feb 23, 2017, suggests running clinical trials in three stages towards meeting higher standards in statistical validation. The idea is to impose a preclinical trial run by an independent team following an initial research showing some potential for some new treatment. The three stages are thus (i) to generate hypotheses; (ii) to test hypotheses; (iii) to test broader application of hypotheses (p.410). While I am skeptical of the chances of this proposal reaching adoption (for various reasons, like, what would the incentive of the second team be [of the B team be?!], especially if the hypothesis is dis-proved, how would both teams share the authorship and presumably patenting rights of the final study?, and how could independence be certain were the B team contracted by the A team?), the statistical arguments put forward in the tribune are rather weak (in my opinion). Repeating experiments with a larger sample size and an hypothesis set a priori rather than cherry-picked is obviously positive, but moving from a p-value boundary of 0.05 to one of 0.01 and to a power of 80% is more a cosmetic than a foundational change. As Andrew and I pointed out in our PNAS discussion of Johnson two years ago.

“the earlier experiments would not need to be held to the same rigid standards.” (p.410)

The article contains a vignette on “the maths of predictive value” that makes intuitive sense but only superficially. First, “the positive predictive value is the probability that a positive result is truly positive” (p.411) A statement that implies a distribution of probability on the space of hypotheses, although I see no Bayesian hint throughout the paper. Second, this (ersatz of a) probability is computed by a ratio of the number of positive results under the hypothesis over the total number of positive results. Which does not make much sense outside a Bayesian framework and even then cannot be assessed experimentally or by simulation without defining a distribution of the output under both hypotheses. Simplistic pictures are the above are not necessarily meaningful. And Nature should certainly invest into a statistical editor!

Steve Fienberg’ obituary in Nature

“Stephen Fienberg was the ultimate public statistician.”

Robin Mejia from CMU published in the 23 Feb issue of Nature an obituary of Steve Fienberg that sums up beautifully Steve’s contributions to science and academia. I like the above quote very much, as indeed Steve was definitely involved in public policies, towards making those more rational and fair. I remember the time he came to Paris-Dauphine to give a seminar and talk on his assessment in a NAS committee on the polygraph (and my surprise at it being used at all in the US and even worse in judiciary issues). Similarly, I remember his involvement in making the US Census based on surveys rather than on an illusory exhaustive coverage of the entire US population. Including a paper in Nature about the importance of surveys. And his massive contributions to preserving privacy in surveys and databases, an issue in which he was a precursor (even though my colleagues at the French Census Bureau did not catch the opportunity when he spent a sabbatical in Paris in 2004). While it is such a sad circumstance that lead to statistics getting a rare entry in Nature, I am glad that Steve can also be remembered that way.

Nature snapshot

The recent issue of Nature, as of Jan 26, 2017!, contained a cartload of interesting review and coverage articles, from the latest version of the quantum computer D-Wave, with a paragraph on quantum annealing that reminded me of a recent arXiv paper I could not understand, seemingly turning the mathematical problem of multivariate optimisation into a truly physical process, to the continuing (Nature-wise) debate on how to oppose Trump, to the biases and shortcomings of policing software, with a mention of Lum and Isaac I discussed here a few months ago, to the unsuspected difficulty to publish a referee’s report when the publisher is Elsevier (unsuspected and unsurprising!)—although I know of colleagues and authors disapproving my publishing referee’s reports identified as such—, to an amazing picture of a bundle of neurons monitored simultaneously, to an entry in the career section on scientific computing and the importance of coding for young investigators, with R at the forefront!

sex, lies, & brain scans [not a book review]

“Sahakian and Gottwald discuss the problem of “reverse inference” regrettably late in the book.”

In the book review section of Nature [Jan 12, 2017 issue], there was a long coverage of the book sex. lies, & brain scans: How fMRI Reveals What Really Goes on in our Minds, by Barbara J. Sahakian and Julia Gottwald. While I have not read the book (which is not even yet out on amazon), I found some mentions of associating brain patterns with criminal behaviour quite puzzling: “neuroimaging will probably be an imperfect predictor of criminal behaviour”. Actually, much more than puzzling, both frightening with its Minority Report prospects [once again quoted as a movie rather than Philip K. Dick’s novel!], and bordering the irrational, for associating breaking rules with a brain pattern. Of course this is just an impression from reading a book review and the attempts may be restricted to psychological diseases rather than attempt at social engineering and brain policing, but if this is the case, as suggested by the review, it is downright scary!

lords of the rings

In the 19 Jan 2017 issue of Nature [that I received two weeks later], a paper by Tarnita et al discusses regular vegetation patterns like fairy patterns. While this would seem like an ideal setting for point process modelling, the article does not seem to get into that direction, debating instead between ecological models. Which combines vegetal self-organisation, with subterranean insect competition. Since the paper seems to derive validation of a model by simulation means without producing a single equation, I went and checked the supplementary material attached to this paper. What I gathered from this material is that the system of differential equations used to build this model seems to be extrapolated by seeking parameter values consistent with what is known” rather than estimated as in a statistical model. Given the extreme complexity of the resulting five page model, I am surprised at the low level of validation of the construct, with no visible proof of stationarity of the (stochastic) model thus constructed, and no model assessment in a statistical sense. Of course, a major disclaimer applies: (a) this area does not even border my domains of (relative) expertise and (b) I have not spent much time perusing over the published paper and the attached supplementary material. (Note: This issue of Nature also contains a fascinating review paper by Nielsen et al. on a detailed scenario of human evolutionary history, based on the sequencing of genomes of extinct hominids.)

Elsevier in the frontline

“Viewed this way, the logo represents, in classical symbolism, the symbiotic relationship between publisher and scholar. The addition of the Non Solus inscription reinforces the message that publishers, like the elm tree, are needed to provide sturdy support for scholars, just as surely as scholars, the vine, are needed to produce fruit. Publishers and scholars cannot do it alone. They need each other. This remains as apt a representation of the relationship between Elsevier and its authors today – neither dependent, nor independent, but interdependent.”

There were two items of news related with the publishark Elsevier in the latest issue of Nature I read. One was that Germany, Peru, and Taiwan had no longer access to Elsevier journals, after negotiations or funding stopped. Meaning the scientists there have to find alternative ways to procure the papers, from the authors’ webpage [I do not get why authors fail to provide their papers through their publication webpage!] to peer-to-peer platforms like Sci-Hub. Beyond this short term solution, I hope this pushes for the development of arXiv-based journals, like Gower’s Discrete Analysis. Actually, we [statisticians] should start planing a Statistics version of it!

The second item is about  Elsevier developing its own impact factor index, CiteScore. While I do not deem the competition any more relevant for assessing research “worth”, seeing a publishark developing its own metrics sounds about as appropriate as Breithart News starting an ethical index for fake news. I checked the assessment of Series B on that platform, which returns the journal as ranking third, with the surprising inclusion of the Annual Review of Statistics and its Application [sic], a review journal that only started two years ago, of Annals of Mathematics, which does not seem to pertain to the category of Statistics, Probability, and Uncertainty, and of Statistics Surveys, an IMS review journal that started in 2009 (of which I was blissfully unaware). And the article in Nature points out that, “scientists at the Eigenfactor project, a research group at the University of Washington, published a preliminary calculation finding that Elsevier’s portfolio of journals gains a 25% boost relative to others if CiteScore is used instead of the JIF“. Not particularly surprising, eh?!

When looking for an illustration of this post, I came upon the hilarious quote given at the top: I particularly enjoy the newspeak reversal between the tree and the vine,  the parasite publishark becoming the support and the academics the (invasive) vine… Just brilliant! (As a last note, the same issue of Nature mentions New Zealand aiming at getting rid of all invasive predators: I wonder if publishing predators are also included!)

quantic random generators

“…the random numbers should be unpredictable by any physical observer, that is, any observer whose actions are constrained by the laws of physics.”

A review paper in Nature by Acin and Masanes is the first paper I ever read there about random number generation! The central debate in the paper is about the notion of randomness, which the authors qualify as above. This seems to exclude the use of “our” traditional random number generators, although I do not see why they could not be used with an unpredictable initialisation, which does not have to be done according to a specific probability distribution. The only thing that matters is unpredictability.

“…the standard method for certifying randomness consists of running statistical tests1 on sequences generated by the device. However, it is unclear what passing these tests means and, in fact, it is impossible to certify with finite computational power that a given sequence is random.”

The paper supports instead physical and quantum devices. Justified or certified by [violations of] the Bell inequality, which separates classic from quantum. Not that I know anything about this. Or that I can make sense of the notations in the paper, like

nature20119-m1which is supposed to translate that the bits are iid Uniform and independent of the environment. Actually, I understood very little of the entire review paper, which is quite frustrating since this may well be the only paper ever published in Nature about random number generation!

“…a generation rate of 42 random bits after approximately one month of measurements, was performed using two entangled ions in two traps at 1-m distance.”

It is also hard to tell whether or not this approach to quantum random number generation has foreseeable practical consequences. There already exist QRNGs, as shown by this example from ANU. And this much more readable review.