A very interesting article by Martyn Hooper in Significance Feb. 2013 issue I just received. (It is available on-line for free.) It raises the question as to how much exactly Price contributed to the famous Essay… Given the percentage of the Essay that can be attributed to Price with certainty (Bayes’ part stops at page 14 out of 32 pages), given the lack of the original manuscript by Bayes, given the delay between the composition of this original manuscript (1755?), its delivery to Price (1761?) and its publication in 1763, given the absence of any other document published by Bayes on the topic, I tend to concur with Martyn Hooper (and Sharon McGrayne) that Price contributed quite significantly to the 1763 paper. Of course, it would sound quite bizarre to start calling our approach to Statistics Pricean or Pricey (or even Priceless!) Statistics, but this may constitute one of the most striking examples of Stigler’s Law of Eponymy!
Archive for Significance
On Wednesday, I was reading the freshly delivered Significance and esp. the several papers therein about statisticians being indicted, fired, or otherwise sued for doing statistics. I mentioned a while ago the possible interpretations of L’Aquila verdict (where I do not know whether any of the six scientists is a statistician), but did not know about Graciela Bevacqua‘s hardship in the Argentinian National Statistics Institute, nor about David Nutt being sacked from the Advisory Council on the Misuse of Drugs, nor about Peter Wilmshurst being sued by NMT (a US medical device corporation) for expressing concern about a clinical trial they conducted. What is most frightening in those stories is that those persons ended up facing those hardships without any support from their respective institutions (quite the opposite in two cases!). And then, on the way home, I further read that the former head of the Greek National Statistics Institute (Elstat) was fired and indicted for over-estimating the Greek deficit, after resisting official pressure to lower it down… Tough job!
I received yet another popular science book to review (for Significance), When the Earth was flat by Graeme Donald. The subtitle is “All the bits of Science we got wrong”, which is both very ambitious (“All”, really?!) and modest (in that most scientific theories are approximations waiting to be invalidated and improved by the next theory). (I wrote this review during my trip in Gainesville, maybe too quickly!)
The themes processed and debunked in this book are wide-ranging. In fact they do not necessarily fall under my definition of science. They often are related to commercial swindles and political agendas loosely based on plainly wrong scientific theories. The book is thus more about the uses of (poor) science than about Science itself. Continue reading
I have just received the latest issue of significance (June 2012) and there are plenty of interesting articles in it (with no horror story as in the latest issue!). From the cover story about finding emperor penguin colonies on satellite images via guano stains (large scale!, with a terrific and terrifying extract from Mawson’s journal) to “moral maps” à la Quételet, to teaching statistics as seen by the young statisticians section (of the RSS), to Tony O’Hagan favourite formul
(where he curiously fails to mention Pythagoras, which is how I justify the formula to my students), to the inappropriateness of using hand X-rays to determine whether Indonesian smugglers are under age or not. The less convincing section is obviously the “controversy” one, where the authors make a mechanistic proposal to bypass the drawbacks of p-values and Type I error, without contemplating the ultimate uses of tests…. Very pleasant read (I could have kept for the looong flight to Australia…)
As usual, reading the latest issue of Significance is quite pleasant and rewarding (although as usual I have to compete with my wife to get hold of the magazine!). This current issue is dedicated to the (London) Olympics. With articles on predictions of future records, on whether or not the 1988 records can be beaten (the Seoul Olympics were the last games before more severe anti-drug tests were introduced), on advices to Usain Bolt for running faster (!) and on the objective dangers of dying from running a marathon (answer: it is much more “dangerous” to train!).
However, a most puzzling (and least statistical) article is Stephanie Kovalchik’s proposal for a gender-neutral Olympics. The author’s theme is that, in most sports (the exceptions being shooting, yachting, and horse riding, where competitions are mixed), raw performances of women are below those of men for physical and physiological reasons. Stephanie Kovalchik thus “question[s] whether a sex-stratified Olympics is the product of groundless stereotypes about male athletic superiority or could be justified by gender differences at the elite level of sport” (p.20). Unsurprisingly, she concludes that no amount of training seems capable to bring both sexes at the same level: indeed, for instance, Paula Radcliffe, the fastest female marathon runner (2:15:24), is still 11 minutes beyond Patrick Makau, the fastest male marathon runner (2:03:38). They are both super-terrific athletes, the top ones in their categories. Now, Paula runs half-marathon and marathon faster than the best male runners in my team (Insee Paris Club). Where’s the problem?! And why should we try to rank Paula against Patrick?!
A parenthesis: the author mentions a most bizarre (but eventually inappropriate) exception: in the Badwater Ultramarathon, a crazy race covering 135 miles and going from Badwater, Death Valley, at 280’ (85m) below sea level, to the Mt. Whitney Portals at nearly 8,300’ (2530m), with a total of 13,000’ (3962m) of cumulative vertical ascent, four women won over the 25 occurrences of the race. I found this phenomenon quite curious and went to check first the records of the comparable ultra-trail du Mont Blanc, another even crazier race (168km, 9,600 metres of positive height gain, at mostly higher altitudes, between 1000m and 2500m), and saw that last year the first woman in the race was 13th in total, with a difference of four and a half hours with the winner (20:36 hours, believe it or not..!). Going back to the Badwater Ultramarathon, checking the results showed that the race actually attracts a very limited number of runners, from 17 finishers the first year to 83 last year (where the first woman was 7th, about 5 hours from the winner), with a huge variation between runners and between years. So I would not draw so much of a conclusion from this example, certainly not that “in an event where sheer dogged endurance, guts and determination must count for almost everything, we may be there already”. It is rather a law of small numbers: such extreme events attract a very small number of participants with incredibly variable finishing times, e.g. two of the four winning women won out of…5 (1988) and 2 (1989) finishers, while the two other victories were achieved by Pamela Reed over 45 (2003) and 57 (2002) competitors, a much more remarkable feat. Meaning that one or two runners missing or giving up brings a huge change in the final time. The ultra-trail du Mont Blanc now involves a thousand runners and there, numbers count. End of the parenthesis (with total respect to all those runners, I wish I could do it!).
Going back to the paper proposal, Stephanie Kovalchik considers that “credit merit apart from hereditary luck will favour individuals who possess the best genes for sport. Thus, prejudice – in the true sense of pre-judging – at the Olympics runs deeper than gender lines. Geneticism more than sexism is to blame for making the possession of a Y chromosome an advantage at the Games” (p.21). She suggests to instead rank athletes by a “statistical adjustment [that would] remove the confounding factor of genetic inheritance, to provide a standard of achievement that all could aim at, no matter what their hereditary luck” (p.22). In essence, the winner would be the one that had gained the most compared with a “demographically matched sample of untrained individuals” (p.24). If I may, this sounds perfectly ridiculous! First, the whole point of the Games and of any sporting competition is to determine the “best” athlete. This is not an egalitarian goal and can and does lead to poor outcomes such as cheating, drug enhanced performances, nationalistic recuperations, commercialisation, bribery, and so on. It is thus perfectly coherent to be against those competitions. (I am not a big fan of the Olympics myself for this reason. However, without competition, even at my very humble level, and with little hope of winning anything, I would certainly train much less than I currently do.) But to try to reward efforts to counteract physical differences sounds like political correctness pushed to the extreme! Second, and this is why I find the paper so a-statistical!, the adjustment must be with respect to a reference population. If we carry the argument to its limit, the only relevant population is made of the athlete him/herself. Indeed, genetic, sociological, cultural, geographical, financial, you-name-it, elements should all be taken into account! Which obviously makes the computation just impossible because then everyone is competing against him/herself.
Large-scale Inference by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s Principles of Applied Statistics, I was thinking of drawing a parallel between the two books. However, while none of them can be classified as textbooks [even though Efron’s has exercises], they differ very much in their intended audience and their purpose. As I wrote in the review of Principles of Applied Statistics, the book has an encompassing scope with the goal of covering all the methodological steps required by a statistical study. In Large-scale Inference, Efron focus on empirical Bayes methodology for large-scale inference, by which he mostly means multiple testing (rather than, say, data mining). As a result, the book is centred on mathematical statistics and is more technical. (Which does not mean it less of an exciting read!) The book was recently reviewed by Jordi Prats for Significance. Akin to the previous reviewer, and unsurprisingly, I found the book nicely written, with a wealth of R (colour!) graphs (the R programs and dataset are available on Brad Efron’s home page).
“I have perhaps abused the “mono” in monograph by featuring methods from my own work of the past decade.” (p.xi)
Sadly, I cannot remember if I read my first Efron’s paper via his 1977 introduction to the Stein phenomenon with Carl Morris in Pour la Science (the French translation of Scientific American) or through his 1983 Pour la Science paper with Persi Diaconis on computer intensive methods. (I would bet on the later though.) In any case, I certainly read a lot of the Efron’s papers on the Stein phenomenon during my thesis and it was thus with great pleasure that I saw he introduced empirical Bayes notions through the Stein phenomenon (Chapter 1). It actually took me a while but I eventually (by page 90) realised that empirical Bayes was a proper subtitle to Large-Scale Inference in that the large samples were giving some weight to the validation of empirical Bayes analyses. In the sense of reducing the importance of a genuine Bayesian modelling (even though I do not see why this genuine Bayesian modelling could not be implemented in the cases covered in the book).
“Large N isn’t infinity and empirical Bayes isn’t Bayes.” (p.90)
The core of Large-scale Inference is multiple testing and the empirical Bayes justification/construction of Fdr’s (false discovery rates). Efron wrote more than a dozen papers on this topic, covered in the book and building on the groundbreaking and highly cited Series B 1995 paper by Benjamini and Hochberg. (In retrospect, it should have been a Read Paper and so was made a “retrospective read paper” by the Research Section of the RSS.) Frd are essentially posterior probabilities and therefore open to empirical Bayes approximations when priors are not selected. Before reaching the concept of Fdr’s in Chapter 4, Efron goes over earlier procedures for removing multiple testing biases. As shown by a section title (“Is FDR Control “Hypothesis Testing”?”, p.58), one major point in the book is that an Fdr is more of an estimation procedure than a significance-testing object. (This is not a surprise from a Bayesian perspective since the posterior probability is an estimate as well.)
“Scientific applications of single-test theory most often suppose, or hope for rejection of the null hypothesis (…) Large-scale studies are usually carried out with the expectation that most of the N cases will accept the null hypothesis.” (p.89)
On the innovations proposed by Efron and described in Large-scale Inference, I particularly enjoyed the notions of local Fdrs in Chapter 5 (essentially pluggin posterior probabilities that a given observation stems from the null component of the mixture) and of the (Bayesian) improvement brought by empirical null estimation in Chapter 6 (“not something one estimates in classical hypothesis testing”, p.97) and the explanation for the inaccuracy of the bootstrap (which “stems from a simpler cause”, p.139), but found less crystal-clear the empirical evaluation of the accuracy of Fdr estimates (Chapter 7, ‘independence is only a dream”, p.113), maybe in relation with my early career inability to explain Morris’s (1983) correction for empirical Bayes confidence intervals (pp. 12-13). I also discovered the notion of enrichment in Chapter 9, with permutation tests resembling some low-key bootstrap, and multiclass models in Chapter 10, which appear as if they could benefit from a hierarchical Bayes perspective. The last chapter happily concludes with one of my preferred stories, namely the missing species problem (on which I hope to work this very Spring).
In the weekend edition of Le Monde, more precisely in the Sciences section, I read a report on a 2011 study made by Levy et al. who observed that the birthrate drops at Halloween and surges at Valentine’s Day… The above graph illustrates the fact for Halloween, with a significant [meaning?!] 5.3% decrease for spontaneous births. The increase for Valentine’s Day is 3.6% (still for spontaneous births). Even though those data are the result of a survey of all births in the United States over 11 years, thus unlikely to exhibit sampling biases, I am fairly bemused both by the phenomenon and by the interpretation made in the study, namely that “pregnant women may be able to control the timing of spontaneous births” (while I find less astounding that “scheduled births are also influenced by the cultural representations of the two holidays“, even though there may be an administrative bias as well). Being unfamiliar with the U.S. procedure for delivery of birth certificates (and how much both Valentine’s Day and Halloween are of a public holiday), I wonder if this may be a reporting rather than biological bias…
Reading now into the paper (thanX, A.!), I see that “these holidays have the advantage of widespread participation, but without ordinarily resulting in the absence of physicians from work, as on certain federal holidays“, so my first idea that the Halloween gap [more pronounced than the Valentine surge] could be due to reduced medical or administrative staff does not seem so likely. The authors mention using an analysis of covariance model to build their significance test, adjusting for weekday and year effect, even though neither the model [what are the covariates?] nor the statistical analysis is provided in the paper. Looking at the equivalent of the above graph for Valentine shows more variability along the window of 15 days used by the authors. It would be fairly interesting to check throughout the years if other variations of that magnitude occur (and if they are always related to culturally significant days), before accepting the conclusion that “pregnant women can expedite or delay spontaneous births, within a limited time frame, in response to cultural representations“…