Archive for p-value hacking

modelling protocol in Nature

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on August 19, 2020 by xi'an

A three-page commentary in a recent issue of Nature is a manifesto for responsible modelling, with among the numerous signatories, Deborah Mayo.  (And Phillip Stark as the only statistician I spotted.) The main theme is that the model is not the real thing, e.g., the map is not the territory. Which as such is hardly debatable. The point of the tribune is that, in the light of the pandemic crisis, a large portion of the general population has discovered that mathematical models were not the truth and that their predictions were to be taken with a few marshes of salt. Either because they were based on faulty or antiquated data, if any. Or because their approximation level was too high to return any reliable figure. A failure to understand the nature of mathematical models reminding me of the 2008 financial crisis and of the bemused question of Liz Windsor and of the muddled response of economists:

“Why did nobody notice it?”

“Your Majesty,” eminent economists replied, “the failure to foresee the timing, extent and severity of the crisis and to head it off, while it had many causes, was principally a failure of the collective imagination of many bright people, both in this country and internationally, to understand the risks to the system as a whole.”

“People got a bit lax … perhaps it is difficult to foresee”

The manifesto calls for open assumptions, sensitivity analysis, uncertainty quantification, wariness of overfitting and structural biases (what is the utility function?), and the inclusion of ignorance acknowledgement as an outcome of the model. Which again sounds completely sound if not necessarily helpful when facing interlocutors asking for point estimates. I also regret that the tribune gives hardly any room to statistics and the model checking tools it had developed, except in mentioning the p-hacking and the false feeling of certainty produced by a p-value. Plus a bizarre mention of a French movement of statactivistes of which I had not heard and which seems connected to a book published in French by three of the signatories.

statistics in Nature [a tale of the two Steves]

Posted in Books, pictures, Statistics with tags , , , , , , , , , on January 15, 2019 by xi'an

In the 29 November issue of Nature, Stephen Senn (formerly at Glasgow) wrote an article about the pitfalls of personalized medicine, for the statistics behind the reasoning are flawed.

“What I take issue with is the de facto assumption that the differential response to a drug is consistent for each individual, predictable and based on some stable property, such as a yet-to-be-discovered genetic variant.”S. Senn

One (striking) reason being that the studies rest on a sort of low-level determinism that does not account for many sources of variability. Over-confidence in causality results. Stephen argues that improvement lies in insisting on repeated experiments on the same subjects (with an increased challenge in modelling since this requires longitudinal models with dependent observations). And to “drop the use of dichotomies”, favouring instead continuous modeling of measurements.

And in the 6 December issue, Steven Goodman calls (in the World view tribune) for probability statements to be attached as confidence indices to scientific claims. That he takes great pain to distinguish from p-values and links with Bayesian analysis. (Bayesian analysis that Stephen regularly objects to.) While I applaud the call, I am quite pessimistic about the follow-up it will generate, the primary reply being that posterior probabilities can be manipulated as well as p-values. And that Bayesian probabilities are not “real” probabilities (dixit Don Fraser or Deborah Mayo).

Big Bayes goes South

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on December 5, 2018 by xi'an

At the Big [Data] Bayes conference this week [which I found quite exciting despite a few last minute cancellations by speakers] there were a lot of clustering talks including the ones by Amy Herring (Duke), using a notion of centering that should soon appear on arXiv. By Peter Müller (UT, Austin) towards handling large datasets. Based on a predictive recursion that takes one value at a time, unsurprisingly similar to the update of Dirichlet process mixtures. (Inspired by a 1998 paper by Michael Newton and co-authors.) The recursion doubles in size at each observation, requiring culling of negligible components. Order matters? Links with Malsiner-Walli et al. (2017) mixtures of mixtures. Also talks by Antonio Lijoi and Igor Pruenster (Boconni Milano) on completely random measures that are used in creating clusters. And by Sylvia Frühwirth-Schnatter (WU Wien) on creating clusters for the Austrian labor market of the impact of company closure. And by Gregor Kastner (WU Wien) on multivariate factor stochastic models, with a video of a large covariance matrix evolving over time and catching economic crises. And by David Dunson (Duke) on distance clustering. Reflecting like myself on the definitely ill-defined nature of the [clustering] object. As the sample size increases, spurious clusters appear. (Which reminded me of a disagreement I had had with David McKay at an ICMS conference on mixtures twenty years ago.) Making me realise I missed the recent JASA paper by Miller and Dunson on that perspective.

Some further snapshots (with short comments visible by hovering on the picture) of a very high quality meeting [says one of the organisers!]. Following suggestions from several participants, it would be great to hold another meeting at CIRM in a near future. Continue reading