Archive for Science

One statistical analysis must not rule them all

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on May 31, 2022 by xi'an

E.J. (Wagenmakers), along with co-authors, published a (long) comment in Nature, rewarded by a illustration by David Parkins! About the over-confidence often carried by (single) statistical analyses, meaning a call for the comparison below different datasets, different models, and different techniques (beyond different teams).

“To gauge the robustness of their conclusions, researchers should subject the data to multiple analyses; ideally, these would be carried out by one or more independent teams. We understand that this is a big shift in how science is done, that appropriate infrastructure and incentives are not yet in place, and that many researchers will recoil at the idea as being burdensome and impractical. Nonetheless, we argue that the benefits of broader, more-diverse approaches to statistical inference could be so consequential that it is imperative to consider how they might be made routine.”

If COVID-19 had one impact on the general public perception of modelling, it is that, to quote Alfred Korzybski, the map is not the territory, i.e., the model is not reality. Hence, the outcome of a model-based analysis, including its uncertainty assessment, depends on the chosen model. And does not include the bias due to this choice. Which is much more complex to ascertain in a sort of things that we do not know we do not know paradigm…. In other words, while we know that all models are wrong, we do not know how much wrong each model is. Except that they disagree with one another in experiments like the above.

“Less understood is how restricting analyses to a single technique effectively blinds researchers to an important aspect of uncertainty, making results seem more precise than they really are.”

The difficulty with E.J.’s proposal is to set a framework for a range of statistical analyses. To which extent should one seek a different model or a different analysis? How can we weight the multiple analyses? Which probabilistic meaning can we attach to the uncertainty between analyses? How quickly will opportunistic researchers learn to play against the house and pretend at objectivity? Isn’t statistical inference already equipped to handle multiple models?

Le Pen election win would be disastrous for research, France and Europe [Nature editorial]

Posted in Kids, Travel, University life with tags , , , , , , , , , , , , , on April 21, 2022 by xi'an

(…) Science is not often a big factor in France’s elections, and this one is no different. But Le Pen is appealing to scientists by pledging to repeal controversial reforms to research institutions enacted between 2007 and 2009 by centre-right president Nicolas Sarkozy — which Macron has continued. Both presidents sought to align France’s universities, research and funding systems more closely with those of the United States and the United Kingdom by giving universities more autonomy; improving links between academics and businesses; and increasing financial support for research-intensive corporations.

Sarkozy changed the law so that funders and university administrations could have more independence in making decisions. His government also provided generous tax breaks to businesses that invest in research and development.

(…) Although Le Pen’s [repeal] policy on the Sarkozy reforms might be welcomed by some researchers, National Rally’s wider programme for government will be anything but. For one, the party’s policy on restricting immigration is likely to hit collaborations with scientists in other countries. And minority communities would face severe discrimination under Le Pen. For example, she has said she wants to ban the wearing of headscarves in public by extending a law that prohibits them in [public] schools.

Furthermore, a Le Pen presidency would put France on a collision course with the EU. Her party is intending to violate European laws and regulations by restricting employment or state benefits for EU citizens from outside France; withholding payments into the EU budget; and ending free movement of people between France and its EU neighbours. Universities and research funders must also confront the possibility that a Le Pen government would seek to restrict academic freedom.

(…) Researchers should consider that any short-term gains in terms of funding would be completely outweighed by the disaster of a Le Pen win. And those dissatisfied with both presidential candidates and considering not voting at all should realize that this, too, is likely to be of benefit to Le Pen. Everyone should look at Hungary for an EU case study of what happens when a far-right leader is elected.

capture-recapture rediscovered

Posted in Books, Statistics with tags , , , , , , , , , , , , on March 2, 2022 by xi'an

A recent Science paper applies capture-recapture to estimating how much medieval literature has been lost, using ancient lists of works and comparing with the currently know corpus. To deduce at a 91% loss. Which begets the next question of how many ancient lists have been lost! Or how many of the observed ones are sheer copies of the others. First I thought I had no access to the paper so could not comment on the specific data and accounting for the uneven and unrandom sampling behind this modelling… But I still would not share the anti-modelling bias of this Harvard historian, given the superlative record of Anne Chao in capture-recapture methodology!

“The paper seems geared more toward systems theorists and statisticians, says Daniel Smail, a historian at Harvard University who studies medieval social and cultural history, and the authors haven’t done enough to establish why cultural production should follow the same rules as life systems. But for him, the bigger question is: Given that we already have catalogs of ancient texts, and previous estimates were pretty close to the model’s new one, what does the new work add?”

Once at Ca’Foscari, I realised the local network gave me access to the paper. The description of the Chao1 method, as far as I can tell, does not describe how the problematic collection of catalogs where duplicates (recaptures) can be observed is taken into account. For one thing, the collection is far from iid since some catalogs must have built on earlier ones. It is also surprising imho that the authors spend space on discussing unbiasedness when a more crucial issue is the randomness assumption behind the collected data.

how many T-Rex can you fit in your backyard?

Posted in Statistics with tags , , , , , on April 30, 2021 by xi'an

A fascinating question examined in this issue of Science [as pointed out by Nature!] in a paper by Marshall et al. on how many T. Rex(es) roamed the Earth at a given time (in the Cretaceous).  The figure is evaluated from Damuth’s Law and relying on estimates of their body mass (8 tons?), the range of its habitat, the longevity of the species (1.2 million years?), its generation time (18 years?), somewhat surprisingly taking the maximum age (28 years) as the age of the oldest observed fossil.

“We assessed the impact of uncertainties in the data used with Monte Carlo simulations, but these simulations do not accommodate uncertainties that might stem from the choices made in the design of our approach.”

The resulting global evaluation is of an abundance of about 20,000 individuals at a given time, albeit with a 95% confidence interval between 1300 and 328,000 animals, with around 127,000 generations, and a total number of T. rex that ever lived amounting to 2.5 billion animals. Fun exercise, but I am rather reserved at the validity of the evaluation, given the uncertainty and poor data about most terms in the equation.

laser sharp random number generator

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on April 1, 2021 by xi'an

Caught the headline of Science News on a super-fast random number generator based on a dysfunctional laser! Producing “254 trillion random digits per second”.

“…when the laser is shined on a surface, its light contains a constantly changing pattern of tiny pinpricks that brighten and dim randomly. The brightness at each spot in the pattern over time can be translated by a computer into a random series of ones and zeros.”

I presume this is covered in the original Science paper [which I cannot access] but the parallel series of 0’s and 1’s should be checked to produce independent Bernoulli B(½) variates before being turned into a genuine random number generator.

%d bloggers like this: