Archive for Nature

Panch at the helm!

Posted in pictures, Travel, University life with tags , , , , , , , , , , , , on January 8, 2020 by xi'an

Reading somewhat by chance a Nature article on the new Director of the National Science Foundation (NSF) nominated by Trump (and yet to be confirmed by the Senate), I found that his name Sethuraman Panchanathan was the name of a friend of my wife 30⁺ years ago when they were both graduate students in image processing at the University of Ottawa, Department of Electrical Engineering… And looking further into the matter, I realised that this was indeed the very friend we knew from that time, with whom w shared laughs, dinners, and a few day trips together around Ottawa! While this is not the ultimate surprise, given that science administration is usually run by scientists, taken from a population pool that is not that large, as exemplified by earlier cases at the national or European level where I had some acquaintance with a then senior officer, it is nonetheless striking (and fun) to hear of a friend moving to a high visibility position after such a long gap. (When comparing NSF and ERC, the European Research Council, with French mathematician Jean-Pierre Bourguignon as current director also appearing in a recent Nature article, I was surprised to see that the ERC budget was more than twice the NSF budget.) Well, good luck to him for sailing these highly political waters!

Nature science images of the year

Posted in Books, pictures, Travel with tags , , , , , , , on December 19, 2019 by xi'an

limited shelf validity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on December 11, 2019 by xi'an

A great article from Steve Stigler in the new, multi-scaled, and so exciting Harvard Data Science Review magisterially operated by Xiao-Li Meng, on the limitations of old datasets. Illustrated by three famous datasets used by three equally famous statisticians, Quetelet, Bortkiewicz, and Gosset. None of whom were fundamentally interested in the data for their own sake. First, Quetelet’s data was (wrongly) reconstructed and missed the opportunity to beat Galton at discovering correlation. Second, Bortkiewicz went looking (or even cherry-picking!) for these rare events in yearly tables of mortality minutely divided between causes such as military horse kicks. The third dataset is not Guinness‘, but a test between two sleeping pills, operated rather crudely over inmates from a psychiatric institution in Kalamazoo, with further mishandling by Gosset himself. Manipulations that turn the data into dead data, as Steve put it. (And illustrates with the above skull collection picture. As well as warning against attempts at resuscitating dead data into what could be called “zombie data”.)

“Successful resurrection is only slightly more common than in Christian theology.”

His global perspective on dead data is that they should stop being used before extending their (shelf) life, rather than turning into benchmarks recycled over and over as a proof of concept. If only (my two cents) because it leads to calibrate (and choose) methods doing well over these benchmarks. Another example that could have been added to the skulls above is the Galaxy Velocity Dataset that makes frequent appearances in works estimating Gaussian mixtures. Which Radford Neal signaled at the 2001 ICMS workshop on mixture estimation as an inappropriate use of the dataset since astrophysical arguments weighted against a mixture modelling.

“…the role of context in shaping data selection and form—context in temporal, political, and social as well as scientific terms—has been shown to be a powerful and interesting phenomenon.”

The potential for “dead-er” data (my neologism!) increases with the epoch in that the careful sleuth work Steve (and others) conducted about these historical datasets is absolutely impossible with the current massive data sets. Massive and proprietary. And presumably discarded once the associated neural net is designed and sold. Letting the burden of unmasking the potential (or highly probable?) biases to others. Most interestingly, this recoups a “comment” in Nature of 17 October by Sabina Leonelli on the transformation of data from a national treasure to a commodity which “ownership can confer and signal power”. But her call for openness and governance of research data seems as illusory as other attempts to sever the GAFAs from their extra-territorial privileges…

retire statistical significance [follow-up]

Posted in Statistics with tags , , , , , , , , , , , , , , on December 9, 2019 by xi'an

[Here is a brief update sent by my coauthors Valentin, Sander, and Blake on events following the Nature comment “Retire Statistical Significance“.]

In the eight months since publication of the comment and of the special issue of The American Statistician, we are glad to see a rich discussion on internet blogs and in scholarly publications and popular media.Nature

One important indication of change is that since March numerous scientific journals have published editorials or revised their author guidelines. We have selected eight editorials that not only discuss statistics reform but give concrete new guidelines to authors. As you will see, the journals differ in how far they want to go with the reform (all but one of the following links are open access).

1) The New England Journal of Medicine, “New Guidelines for Statistical Reporting in the Journal

2) Pediatric Anesthesia, “Embracing uncertainty: The days of statistical significance are numbered

3) Journal of Obstetric, Gynecologic & Neonatal Nursing, “The Push to Move Health Care Science Beyond p < .05

4) Brain and Neuroscience Advances, “Promoting and supporting credibility in neuroscience

5) Journal of Wildlife Management, “Vexing Vocabulary in Submissions to the Journal of Wildlife Management”

6) Demographic Research, “P-values, theory, replicability, and rigour

7) Journal of Bone and Mineral Research, “New Guidelines for Data Reporting and Statistical Analysis: Helping Authors With Transparency and Rigor in Research

8) Significance, “The S word … and what to do about it

Further, some of you took part in a survey by Tom Hardwicke and John Ioannidis that was published in the European Journal of Clinical Investigation along with editorials by Andrew Gelman and Deborah Mayo.

We replied with a short commentary in that journal, “Statistical Significance Gives Bias a Free Pass

And finally, joining with the American Statistical Association (ASA), the National Institute of Statistical Sciences (NISS) in the United States has also taken up the reform issue.

where K. works

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , on December 2, 2019 by xi'an

radioactive ant fiction [in J. Hymenoptera Research]

Posted in Books, Kids, pictures, University life with tags , , , , , , , , , on November 30, 2019 by xi'an

Following a link in Nature, I read a short communication in the Journal of Hymenoptera Research [which I confess I rarely peruse!], which sounded more like a B movie from the 1950s than a scientific article. Starting with the title “Ants trapped for years in an old bunker; survival by cannibalism“! (This is actually the second episode in the series.) While the bunker was intended for storing Soviet nuclear weapons, no radioactivity impacted on the ants and they (the colony) survived in the dark at the bottom of the bunker for 22 years, with no source of food but their own, with new ants falling into the bunker on a regular basis. Hence the title. What I found most surprising in the paper is the fact that it is a sheer description of an observation (with field pictures) and of an intervention (we set a 3m vertical boardwalk to allow for escape) that reminded me more of my childhood fascination with ants (involving radical interventions) than of a typical science paper!

quantum simulation or supremacy?

Posted in Books, Statistics, University life with tags , , , , , on November 11, 2019 by xi'an

d41586-019-03167-2_17301142Nature this week contains a prominent paper by Arute et al. reporting an experiment on a quantum computer running a simulation on a state-space of dimension 253 (which is the number of qubits in their machine, plus one dedicated to error correction if I get it right). With a million simulation of the computer state requiring 200 seconds. Which they claim would take 10,000 years (3 10¹¹ seconds) to run on a classical super-computer. And which could be used towards producing certified random numbers, an impressive claim given the intrinsic issue of qubit errors. (This part is not developed in the paper but I wonder how a random generator could handle such errors.)

“…a “challenger” generates a random quantum circuit C (i.e., a random sequence of 1-qubit and nearest-neighbor 2-qubit gates, of depth perhaps 20, acting on a 2D grid of n = 50 to 60 qubits). The challenger then sends C to the quantum computer, and asks it apply C to the all-0 initial state, measure the result in the {0,1} basis, send back whatever n-bit string was observed, and repeat some thousands or millions of times. Finally, using its knowledge of C, the classical challenger applies a statistical test to check whether the outputs are consistent with the QC having done this.” The blog of Scott Aaronson

I have tried reading the Nature paper but had trouble grasping the formidable nature of the simulation they were discussing, as it seems to be the reproduction by a simulation of a large quantum circuit of depth 20, as helpfully explained in the above quote. And checking the (non-uniform) distribution of the random simulation is the one expected. Which is the hard part and requires a classical (super-)computer to determine the theoretical distribution. And the News & Views entry in the same issue of Nature. According to Wikipedia, “the best known algorithm for simulating an arbitrary random quantum circuit requires an amount of time that scales exponentially with the number of qubits“. However, IBM (a competitor of Google in the quantum computer race) counter-claims that the simulation of the circuit takes only 2.5 days on a classical computer with optimised coding. (And this should be old news by the time this blog post comes out, since even a US candidate for the presidency has warned about it!)