**S**ophie Donnet pointed out to me this arXived paper by Tianxi Li, Elizaveta Levina, and Ji Zhu, on a network resampling strategy for X validation, where I appear as a datapoint rather than as a [direct] citation! Which reminded me of the “where you are the hero” gamebooks with which my kids briefly played, before computer games took over. The model selection method is illustrated on a dataset made of X citations [reduced to 706 authors] in all papers published between 2003 and 2012 in the Annals of Statistics, Biometrika, JASA, and JRSS Series B. With the outcome being the determination of a number of communities, 20, which the authors labelled as they wanted, based on 10 authors with the largest number of citations in the category. As it happens, I appear in the list, within the “mixed (causality + theory + Bayesian)” category (!), along with Jamie Robbins, Paul Fearnhead, Gilles Blanchard, Zhiqiang Tan, Stijn Vansteelandt, Nancy Reid, Jae Kwang Kim, Tyler VanderWeele, and Scott Sisson, which is somewhat mind-boggling in that I am pretty sure I never quoted six of these authors [although I find it hilarious that Jamie appears in the category, given that we almost got into a car crash together, at one of the Valencià meetings!].

## Archive for causality

## statistics in Nature [a tale of the two Steves]

Posted in Books, pictures, Statistics with tags Bayesian Analysis, causality, clinical trials, frequentism, Nature, p-value hacking, placebo effect, statistical evidence, Stephen Senn, variability on January 15, 2019 by xi'an**I**n the 29 November issue of Nature, Stephen Senn (formerly at Glasgow) wrote an article about the pitfalls of personalized medicine, for the statistics behind the reasoning are flawed.

“What I take issue with is the de facto assumption that the differential response to a drug is consistent for each individual, predictable and based on some stable property, such as a yet-to-be-discovered genetic variant.”S. Senn

One (striking) reason being that the studies rest on a sort of low-level determinism that does not account for many sources of variability. Over-confidence in causality results. Stephen argues that improvement lies in insisting on repeated experiments on the same subjects (with an increased challenge in modelling since this requires longitudinal models with dependent observations). And to “drop the use of dichotomies”, favouring instead continuous modeling of measurements.

And in the 6 December issue, Steven Goodman calls (in the World view tribune) for probability statements to be attached as confidence indices to scientific claims. That he takes great pain to distinguish from p-values and links with Bayesian analysis. (Bayesian analysis that Stephen regularly objects to.) While I applaud the call, I am quite pessimistic about the follow-up it will generate, the primary reply being that posterior probabilities can be manipulated as well as p-values. And that Bayesian probabilities are not “real” probabilities (dixit Don Fraser or Deborah Mayo).

## Dutch book for sleeping beauty

Posted in Books, Kids, Statistics, University life with tags Bansky, BFF4, causality, Cindirella, decision theory, Dutch book, evidential decision theory, Harvard University, sleeping beauty paradox on May 15, 2017 by xi'an**A**fter my short foray in Dutch book arguments two weeks ago in Harvard, I spotted a recent arXival by Vincent Conitzer analysing the sleeping beauty paradox from a Dutch book perspective. (The paper *“A Dutch book against sleeping beauties who are evidential decision theorists”* actually appeared in Synthese two years ago, which makes me wonder why it comes out only now on arXiv. And yes I am aware the above picture is about Bansky’s Cindirella and not sleeping beauty!)

“if Beauty is an evidential decision theorist, then in variants where she does not always have the same information available to her upon waking, she is vulnerable to Dutch books, regardless of whether she is a halfer or a thirder.”

As recalled in the introduction of the paper, there exist ways to construct Dutch book arguments against thirders and halfers alike. Conitzer constructs a variant that also distinguishes between a causal and an evidential decision theorist (sleeping beauty), the later being susceptible to another Dutch book. Which is where I get lost as I have no idea of a distinction between those two types of decision theory. Quickly checking on Wikipedia returned the notion that the latter decision theory maximises the expected utility *conditional* on the decision, but this does not clarify the issue in that it seems to imply the decision impacts the probability of the event… Hence keeping me unable to judge of the relevance of the arguments therein (which is no surprise since only based on a cursory read).

## truth or truthiness [book review]

Posted in Books, Kids, pictures, Statistics, University life with tags Andrew Gelman, Cambridge University Press, causality, CHANCE, data science, Don Rubin, fracking, Howard Wainer, Oklahoma, testing, tribune, truthiness, University of Warwick on March 21, 2017 by xi'an**T**his 2016 book by Howard Wainer has been sitting (!) on my desk for quite a while and it took a long visit to Warwick to find a free spot to quickly read it and write my impressions. The subtitle is, as shown on the picture, *“Distinguishing fact from fiction by learning to think like a data scientist”*. With all due respect to the book, which illustrates quite pleasantly the dangers of (pseudo-)data mis- or over- (or eve under-)interpretation, and to the author, who has repeatedly emphasised those points in his books and ~~tribunes~~ opinion columns, including those in CHANCE, I do not think the book teaches how to think like a data scientist. In that an arbitrary neophyte reader would not manage to handle a realistic data centric situation without deeper training. But this collection of essays, some of which were tribunes, makes for a nice reading nonetheless.

I presume that in this post-truth and alternative facts [dark] era, the notion of *truthiness* is familiar to most readers! It is often based on a misunderstanding or a misappropriation of data leading to dubious and unfounded conclusions. The book runs through dozens of examples (some of them quite short and mostly appealing to common sense) to show how this happens and to some extent how this can be countered. If not avoided as people will always try to bend, willingly or not, the data to their conclusion.

There are several parts and several themes in Truth or Truthiness, with different degrees of depth and novelty. The more involved part is in my opinion the one about causality, with illustrations in educational testing, psychology, and medical trials. (The illustration about fracking and the resulting impact on Oklahoma earthquakes should not be in the book, except that there exist officials publicly denying the facts. The same remark applies to the testing cheat controversy, which would be laughable had not someone ended up the victim!) The section on graphical representation and data communication is less exciting, presumably because it comes *after* Tufte’s books and message. I also feel the 1854 cholera map of John Snow is somewhat over-exploited, since he only drew the map after the epidemic declined. The final chapter * Don’t Try this at Home* is quite anecdotal and at the same time this may the whole point, namely that in mundane questions thinking like a data scientist is feasible and leads to sometimes surprising conclusions!

*“In the past a theory could get by on its beauty; in the modern world, a successful theory has to work for a living.” (p.40)*

The book reads quite nicely, as a whole and a collection of pieces, from which class and talk illustrations can be borrowed. I like the “learned” tone of it, with plenty of citations and witticisms, some in Latin, Yiddish and even French. (Even though the later is somewhat inaccurate! *Si ça avait pu se produire, ça avait dû se produire* [p.152] would have sounded more vernacular in my Gallic opinion!) I thus enjoyed unreservedly Truth or Truthiness, for its rich style and critical message, all the more needed in the current times, and far from comparing it with a bag of potato chips as Andrew Gelman did, I would like to stress its classical tone, in the sense of being immersed in a broad and deep culture that seems to be receding fast.

## causality

Posted in Books, Statistics, University life with tags Annual Review of Statistics and Its Application, Bayes nets, Bayesian ideas, book review, causality, Causality in the Sciences, David Hume, inference, OUP, philosophy of sciences, repeatability of experiments on March 7, 2016 by xi'an**O**xford University Press sent me this book by Phyllis Illari and Frederica Russo, Causality (Philosophical theory meets scientific practice) a ~~little~~ while ago. (The book appeared in 2014.) Unless I asked for it, I cannot remember…

“The problem is whether and how to use information of general causation established in science to ascertain individual responsibility.” (p.38)

As the subtitle indicates, this is a philosophy book, not a statistics book. And not particularly intended for statisticians. Hence, I am not exactly qualified to analyse its contents, and even less to criticise its lack of connection with statistics. But this being a blog post… I read rather slowly through the book, which exposes a wide range (“a map”, p.8) of approaches and perspectives on the notions of causality, some ways to infer about causality, and the point of doing all this, concluding with a relativistic (and thus eminently philosophical) viewpoint defending a “pluralistic mosaic” or a “causal mosaic” that relates to all existing accounts of causality as they “each do something valuable” (p.258). From a naïve bystander perspective, this sounds like a new avatar of deconstructionism applied to causality.

“Simulations can be very illuminating about various phenomena that are complex and have unexpected effects (…) can be run repeatedly to study a system in different situations to those seen for the real system…” (p.15)

This is not to state that the book is uninteresting, as it provides a wide entry into philosophical attempts at categorising and defining causality, if not into the statistical aspects of the issue. (For instance, the problem whether or not causality can be proven uniquely from a statistical perspective is not mentioned.) Among those interesting points in the early chapters, a section (2.5) about simulation. Which however misses the depth of this earlier book on climate simulations I reviewed while in Monash. Or of the discussions at the interdisciplinary seminar last year in Hanover. I.J. Good’s probabilistic causality is mentioned but hardly detailed. (With the warning remark that one “should not confuse predictability with determinism [and] determinism with causality”, p.82.) Continue reading

## from statistical evidence to evidence of causality

Posted in Books, Statistics with tags Bayesian inference, causality, confounders, counterfactuals, George Casella, inference, Ithaca, JASA, odds ratio, risk ratio on December 24, 2013 by xi'an**I** took the opportunity of having to wait at a local administration a long while today (!) to read an arXived paper by Dawid, Musio and Fienberg on the−both philosophical and practical−difficulty to establish the probabilities of the causes of effects. The first interesting thing about the paper is that it relates to the Médiator drug scandal that took place in France in the past year and still is under trial: thanks to the investigations of a local doctor, Irène Frachon, the drug was exposed as an aggravating factor for heart disease. Or maybe the cause. The case-control study of Frachon summarises into a 2×2 table with a corrected odds ratio of 17.1. From there, the authors expose the difficulties of drawing inference about causes of effects, i.e. causality, an aspect of inference that has always puzzled me. (And the paper led me to search for the distinction between odds ratio and risk ratio.)

“And the conceptual and implementational difficulties that we discuss below, that beset even the simplest case of inference about causes of effects, will be hugely magnified when we wish to take additional account of such policy considerations.”

**A** third interesting notion in the paper is the inclusion of counterfactuals. My introduction to counterfactuals dates back to a run in the back-country roads around Ithaca, New York, when George told me about a discussion paper from Phil he was editing for JASA on that notion with his philosopher neighbour Steven Schwartz as a discussant. (It was a great run, presumably in the late Spring. And the best introduction I could dream of!) Now, the paper starts from the counterfactual perspective to conclude that inference is close to impossible in this setting. Within my limited understanding, I would see that as a drawback of using counterfactuals, rather than of drawing inference about causes. If the corresponding statistical model is nonindentifiable, because one of the two responses is always missing, the model seems inappropriate. I am also surprised at the notion of “sufficiency” used in the paper, since it sounds like the background information cancels the need to account for the treatment (e.g., aspirin) decision. The fourth point is the derivation of bounds on the probabilities of causation, despite everything! Quite an interesting read thus!

## Bayesian introductions at IXXI

Posted in Mountains, Statistics, Travel, University life with tags ABC, Bayesian statistics, causality, Grenoble, history of statistics, neurosciences, population genetics on October 28, 2013 by xi'an**T**en days ago I did a lighting-fast visit to Grenoble for a quick introduction to Bayesian notions during a Bayesian day organised by Michael Blum. It was supported by IXXI, Rhône Alpes Complex Systems Institute, a light structure that favors interdisciplinary research to model complex sytems such as biological or social systems, technological networks… This was an opportunity to recycle my Budapest overview from Bayes 250th to Bayes 2.5.0. *(As I have changed my email signature initial from X to IX, I further enjoyed the name IXXI!)* More seriously, I appreciated (despite the too short time spent there!) the mix of perspectives and disciplines represented in this introduction, from Bayesian networks and causality in computer science and medical expert systems, to neurosciences and the Bayesian theory of mind, to Bayesian population genetics. And hence the mix of audiences. The part about neurosciences and social representations on others’ mind reminded me of the discussion with Pierre Bessières we had a year ago on France Culture. Again, I am quite sorry and apologetic for having missed part of the day and opportunities for discussions, simply because of a tight schedule this week…