Archive for Nature

Bayesian maps of Africa

Posted in pictures, Statistics with tags , , , , , , on March 21, 2018 by xi'an

A rather special issue of Nature this week (1 March 2018) as it addresses Bayesian geo-cartography and mapping childhood growth failure and educational achievement (along with sexual differences) all across Africa! Including the (nice) cover of the journal, a preface by Kofi Annan, a cover article by Brian Reich and Murali Haran, and the first two major articles of the journal, one of which includes Ewan Cameron as a co-author. As I was reading this issue of Nature in the train back from Brussels, I could not access the supplementary material, so could not look at the specifics of the statistics, but the maps look quite impressive with a 5×5 km² resolution. And inclusion not only of uncertainty maps but also of predictive maps on the probability of achieving WHO 2025 goals. Surprisingly close to one in some parts of Africa. In terms of education, there are strong oppositions between different regions, with the south of the continent, including Madagascar, showing a positive difference for women in terms of years of education. While there is no reason (from my train seat) to doubt the statistical analyses, I take quite seriously the reservation of the authors that the quality of the prediction cannot be better than the quality of the data, which is “determined by the volume and fidelity of nationally representative surveys”. Which relates to an earlier post of mine about a similar concern with the deaths in Congo.

AlphaGo [100 to] zero

Posted in Books, pictures, Statistics, Travel with tags , , , on December 12, 2017 by xi'an

While in Warwick last week, I read a few times through Nature article on AlphaGo Zero, the new DeepMind program that learned to play Go by itself, through self-learning, within a few clock days, and achieved massive superiority (100 to 0) over the earlier version of the program, which (who?!) was based on a massive data-base of human games. (A Nature paper I also read while in Warwick!) From my remote perspective, the neural network associated with AlphaGo Zero seems more straightforward that the double network of the earlier version. It is solely based on the board state and returns a probability vector p for all possible moves, as well as the probability of winning from the current position. There are still intermediary probabilities π produced by a Monte Carlo tree search, which drive the computation of a final board, the (reinforced) learning aiming at bringing p and π as close as possible, via a loss function like

(z-v)²-<π, log p>+c|θ

where z is the game winner and θ is the vector of parameters of the neural network. (Details obviously missing above!) The achievements of this new version are even more impressive than those of the earlier one (which managed to systematically beat top Go players) in that blind exploration of game moves repeated over some five million games produced a much better AI player. With a strategy at times remaining a mystery to Go players.

Incidentally a two-page paper appeared on arXiv today with the title Demystifying AlphaGo Zero, by Don, Wu, and Zhou. Which sets AlphaGo Zero as a special generative adversarial network. And invoking Wasserstein distance as solving the convergence of the network. To conclude that “it’s not [sic] surprising that AlphaGo Zero show [sic] a good convergence property”… A most perplexing inclusion in arXiv, I would say.

5 ways to fix statistics?!

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on December 4, 2017 by xi'an

In the last issue of Nature (Nov 30), the comment section contains a series of opinions on the reproducibility crisis, by five [groups of] statisticians. Including Blakeley McShane and Andrew Gelman with whom [and others] I wrote a response to the seventy author manifesto. The collection of comments is introduced with the curious sentence

“The problem is not our maths, but ourselves.”

Which I find problematic as (a) the problem is never with the maths, but possibly with the stats!, and (b) the problem stands in inadequate assumptions on the validity of “the” statistical model and on ignoring the resulting epistemic uncertainty. Jeff Leek‘s suggestion to improve the interface with users seems to come short on that level, while David Colquhoun‘s Bayesian balance between p-values and false-positive only address well-specified models. Michèle Nuitjen strikes closer to my perspective by arguing that rigorous rules are unlikely to help, due to the plethora of possible post-data modellings. And Steven Goodman’s putting the blame on the lack of statistical training of scientists (who “only want enough knowledge to run the statistical software that allows them to get their paper out quickly”) is wishful thinking: every scientific study [i.e., the overwhelming majority] involving data cannot involve a statistical expert and every paper involving data analysis cannot be reviewed by a statistical expert. I thus cannot but repeat the conclusion of Blakeley and Andrew:

“A crucial step is to move beyond the alchemy of binary statements about ‘an effect’ or ‘no effect’ with only a P value dividing them. Instead, researchers must accept uncertainty and embrace variation under different circumstances.”

a lifetime word limit…

Posted in Books, Kids, pictures, University life with tags , , , on November 20, 2017 by xi'an

“Exceptions might have to be made for experts such as statisticians and bioinformaticians whose skills are required on many papers.”

One of these weird editorials periodically occurring in Nature. By Brian Martinson, suggesting that the number of words allotted to a scientist should be capped. Weird, indeed, and incomprehensible that Nature wastes one of its so desperately sought journal page on such a fantastic (in the sense of fantasy, not as in great!) proposal. With sentences like “if we don’t address our own cognitive biases and penchant for compelling narratives, word limits could exacerbate tendencies to publish only positive findings, leading researchers to explore blind alleys that others’ negative results could have illuminated” not making much sense even in this fantasy academic world… As for the real world, the list of impossibilities and contradictions stemming from this proposal would certainly eat all of my allotted words. Even those allotted to a statistician. The supreme irony of the (presumably tongue-in-cheek) editorial is that the author himself does not seem particularly concerned by capping his own number of papers! (Nice cover, by the way!)

long journey to reproducible results [or not]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on November 17, 2017 by xi'an

A rather fascinating article in Nature of last August [hidden under a pile of newspapers at home!]. By Gordon J. Lithgow, Monica Driscoll and Patrick Phillips. About their endeavours to explain for divergent outcomes in the replications [or lack thereof] of an earlier experiment on anti-aging drugs tested on roundworms. Rather than dismissing the failures or blaming the other teams, the above researchers engaged for four years (!) into the titanic and grubby task of understanding the reason(s) for such discrepancies.

Finding that once most causes for discrepancies (like gentle versus rough lab technicians!) were eliminated, there were still two “types” of worms, those short-lived and those long-lived, for reasons yet unclear. “We need to repeat more experiments than we realized” is a welcome conclusion to this dedicated endeavour, worth repeating in different circles. And apparently missing in the NYT coverage by Susan Dominus of the story of Amy Cuddy, a psychologist at the origin of the “power pose” theory that got later disputed for lack of reproducibility. Article which main ideological theme is that Cuddy got singled-out in the replication crisis because she is a woman and because her “power pose” theory is towards empowering women and minorities. Rather than because she keeps delivering the same message, mostly outside academia, despite the lack of evidence and statistical backup. (Dominus’ criticisms of psychologists with “an unusual interest in statistics” and of Andrew’s repeated comments on the methodological flaws of the 2010 paper that started all are thus particularly unfair. A Slate article published after the NYT coverage presents an alternative analysis of this affair. Andrew also posted on Dominus paper, with a subsequent humongous trail of comments!)

Nature snapshots [and snide shots]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 12, 2017 by xi'an

A very rich issue of Nature I received [late] just before leaving for Warwick with a series of reviews on quantum computing, presenting machine learning as the most like immediate application of this new type of computing. Also including irate letters and an embarassed correction of an editorial published the week before reflecting on the need (or lack thereof) to remove or augment statues of scientists whose methods were unethical, even when eventually producing long lasting advances. (Like the 19th Century gynecologist J. Marion Sims experimenting on female slaves.) And a review of a book on the fascinating topic of Chinese typewriters. And this picture above of a flooded playground that looks like a piece of abstract art thanks to the muddy background.

“Quantum mechanics is well known to produce atypical patterns in data. Classical machine learning methods such as deep neural networks frequently have the feature that they can both recognize statistical patterns in data and produce data that possess the same statistical patterns: they recognize the patterns that they produce. This observation suggests the following hope. If small quantum information processors can produce statistical patterns that are computationally difficult for a classical computer to produce, then perhaps they can also recognize patterns that are equally difficult to recognize classically.” Jacob Biamonte et al., Nature, 14 Sept 2017

One of the review papers on quantum computing is about quantum machine learning. Although like Jon Snow I know nothing about this, I find it rather dull as it spends most of its space on explaining existing methods like PCA and support vector machines. Rather than exploring potential paradigm shifts offered by the exotic nature of quantum computing. Like moving to Bayesian logic that mimics a whole posterior rather than produces estimates or model probabilities. And away from linear representations. (The paper mentions a O(√N) speedup for Bayesian inference in a table, but does not tell more, which may thus be only about MAP estimators for all I know.) I also disagree with the brave new World tone of the above quote or misunderstand its meaning. Since atypical and statistical cannot but clash, “universal deep quantum learners may recognize and classify patterns that classical computers cannot” does not have a proper meaning. The paper contains a vignette about quantum Boltzman machines that finds a minimum entropy approximation to a four state distribution, with comments that seem to indicate an ability to simulate from this system.

stop the rot!

Posted in Statistics with tags , , , , , , , , , , , , on September 26, 2017 by xi'an

Several entries in Nature this week about predatory journals. Both from Ottawa Hospital Research Institute. One emanates from the publication officer at the Institute, whose role is “dedicated to educating researchers and guiding them in their journal submission”. And telling the tale of a senior scientist finding out a paper submitted to a predatory journal and later rescinded was nonetheless published by the said journal. Which reminded me of a similar misadventure that occurred to me a few years ago. After having a discussion of an earlier paper therein rejected from The American Statistician, my PhD student Kaniav Kamary and I resubmitted it to the Journal of Applied & Computational Mathematics, from which I had received an email a few weeks earlier asking me in flowery terms for a paper. When the paper got accepted as such two days after submission, I got alarmed and realised this was a predatory journal, which title played with the quasi homonymous Journal of Computational and Applied Mathematics (Elsevier) and International Journal of Applied and Computational Mathematics (Springer). Just like the authors in the above story, we wrote back to the editors, telling them we were rescinding our submission, but never got back any reply or request of copyright transfer. Instead, requests for (diminishing) payments were regularly sent to us, for almost a year, until they ceased. In the meanwhile, the paper had been posted on the “journal” website and no further email of ours, including some from our University legal officer, induced a reply or action from the journal…

The second article in Nature is from a group of epidemiologists at the same institute, producing statistics about biomedical publications in predatory journals (characterised as such by the defunct Beall blacklist). And being much more vehement about the danger represented by these journals, which “articles we examined were atrocious in terms of reporting”, and authors submitting to them, as unethical for wasting human and animal observations. The authors of this article identify thirteen characteristics for spotting predatory journals, the first one being “low article-processing fees”, our own misadventure being the opposite. And they ask for higher control and auditing from the funding institutions over their researchers… Besides adding an extra-layer to the bureaucracy, I fear this is rather naïve, as if the boundary between predatory and non-predatory journals was crystal clear, rather than a murky continuum. And putting the blame solely on the researchers rather than sharing it with institutions always eager to push their bibliometrics towards more automation of the assessment of their researchers.