Archive for False positive

exoplanets at 99.999…%

Posted in Books, pictures, Statistics, University life with tags , , , , , on January 22, 2016 by xi'an

The latest Significance has a short article providing some coverage of the growing trend in the discovery of exoplanets, including new techniques used to detect those expoplanets from their impact on the associated stars. This [presumably] comes from the recent book Cosmos: The Infographics Book of Space [a side comment: new books seem to provide material for many articles in Significance these days!] and the above graph is also from the book, not the ultimate infographic representation in my opinion given that a simple superposition of lines could do as well. Or better.

¨A common approach to ruling out these sorts of false positives involves running sophisticated numerical algorithms, called Monte Carlo simulations, to explore a wide range of blend scenarios (…) A new planet discovery needs to have a confidence of (…) a one in a million chance that the result is in error.”

The above sentence is obviously of interest, first because the detection of false positives by Monte Carlo hints at a rough version of ABC to assess the likelihood of the observed phenomenon under the null [no detail provided] and second because the probability statement in the end is quite unclear as of its foundations… Reminding me of the Higgs boson controversy. The very last sentence of the article is however brilliant, albeit maybe unintentionaly so:

“To date, 1900 confirmed discoveries have been made. We have certainly come a long way from 1989.”

Yes, 89 down, strictly speaking!

Statistical evidence for revised standards

Posted in Statistics, University life with tags , , , , , , , , , on December 30, 2013 by xi'an

In yet another permutation of the original title (!), Andrew Gelman posted the answer Val Johnson sent him after our (submitted)  letter to PNAS. As Val did not send me a copy (although Andrew did!), I will not reproduce it here and I rather refer the interested readers to Andrews’ blog… In addition to Andrew’s (sensible) points, here are a few idle (post-X’mas and pre-skiing) reflections:

  • “evidence against a false null hypothesis accrues exponentially fast” makes me wonder in which metric this exponential rate (in γ?) occurs;
  • that “most decision-theoretic analyses of the optimal threshold to use for declaring a significant finding would lead to evidence thresholds that are substantially greater than 5 (and probably also greater 25)” is difficult to accept as an argument since there is no trace of a decision-theoretic argument in the whole paper;
  • Val rejects our minimaxity argument on the basis that “[UMPBTs] do not involve minimization of maximum loss” but the prior that corresponds to those tests is minimising the integrated probability of not rejecting at threshold level γ, a loss function integrated against parameter and observation, a Bayes risk in other words… Point masses or spike priors are clearly characteristics of minimax priors. Furthermore, the additional argument that “in most applications, however, a unique loss function/prior distribution combination does not exist” has been used by many to refute the Bayesian perspective and makes me wonder what are the arguments left in using a (pseudo-)Bayesian approach;
  • the next paragraph is pure tautology: the fact that “no other test, based on either a subjectively or objectively specified alternative hypothesis, is as likely to produce a Bayes factor that exceeds the specified evidence threshold” is a paraphrase of the definition of UMPBTs, not an argument. I do not see we should solely “worry about false negatives”, since minimising those should lead to a point mass on the null (or, more seriously, should not lead to the minimax-like selection of the prior under the alternative).

Revised evidence for statistical standards

Posted in Kids, Statistics, University life with tags , , , , , , , , on December 19, 2013 by xi'an

valizWe just submitted a letter to PNAS with Andrew Gelman last week, in reaction to Val Johnson’s recent paper “Revised standards for statistical evidence”, essentially summing up our earlier comments within 500 words. Actually, we wrote one draft each! In particular, Andrew came up with the (neat) rhetorical idea of alternative Ronald Fishers living in parallel universes who had each set a different significance reference level and for whom alternative Val Johnsons would rise and propose a modification of the corresponding Fisher’s level. For which I made the above graph, left out of the letter and its 500 words. It relates “the old z” and “the new z”, meaning the boundaries of the rejection zones when, for each golden dot, the “old z” is the previous “new z” and “the new z” is Johnson’s transform. We even figured out that Val’s transform was bringing the significance down by a factor of 10 in a large range of values. As an aside, we also wondered why most of the supplementary material was spent on deriving UMPBTs for specific (formal) problems when the goal of the paper sounded much more global…

As I am aware we are not the only ones to have submitted a letter about Johnson’s proposal, I am quite curious at the reception we will get from the editor! (Although I have to point out that all of my earlier submissions of letters to to PNAS got accepted.)

Valen in Le Monde

Posted in Books, Statistics, University life with tags , , , , , , , , , , on November 21, 2013 by xi'an

Valen Johnson made the headline in Le Monde, last week. (More precisely, to the scientific blog Passeur de Sciences. Thanks, Julien, for the pointer!) With the alarming title of “(A study questions one major tool of the scientific approach). The reason for this French fame is Valen’s recent paper in PNAS, Revised standards for statistical evidence, where he puts forward his uniformly most powerful Bayesian tests (recently discussed on the ‘Og) to argue against the standard 0.05 significance level and in favour of “the 0.005 or 0.001 level of significance.”

“…many statisticians have noted that P values of 0.05 may correspond to Bayes factors that only favor the alternative hypothesis by odds of 3 or 4–1…” V. Johnson, PNAS

While I do plan to discuss the PNAS paper later (and possibly write a comment letter to PNAS with Andrew), I find interesting the way it made the headlines within days of its (early edition) publication: the argument suggesting to replace .05 with .001 to increase the proportion of reproducible studies is both simple and convincing for a scientific journalist. If only the issue with p-values and statistical testing could be that simple… For instance, the above quote from Valen is reproduced as “an [alternative] hypothesis that stands right below the significance level has in truth only 3 to 5 chances to 1 to be true”, the “truth” popping out of nowhere. (If you read French, the 300+ comments on the blog are also worth their weight in jellybeans…)

statistical significance as explained by The Economist

Posted in Books, Statistics, University life with tags , , , , , , on November 7, 2013 by xi'an

There is a long article in The Economist of this week (also making the front cover), which discusses how and why many published research papers have unreproducible and most often “wrong” results. Nothing immensely new there, esp. if you read Andrew’s blog on a regular basis, but the (anonymous) writer(s) take(s) pains to explain how this related to statistics and in particular statistical testing of hypotheses. The above is an illustration from this introduction to statistical tests (and their interpretation).

“First, the statistics, which if perhaps off-putting are quite crucial.”

It is not the first time I spot a statistics backed article in this journal and so assume it has either journalists with a statistics background or links with (UK?) statisticians. The description of why statistical tests can err is fairly (Type I – Type II) classical. Incidentally, it reports a finding of Ioannidis that when reporting a positive at level 0.05,  the expectation of a false positive rate of one out of 20 is “highly optimistic”. An evaluation opposed to, e.g., Berger and Sellke (1987) who reported a too-early rejection in a large number of cases. More interestingly, the paper stresses that this classical approach ignores “the unlikeliness of the hypothesis being tested”, which I interpret as the prior probability of the hypothesis under test.

“Statisticians have ways to deal with such problems. But most scientists are not statisticians.”

The paper also reports about the lack of power in most studies, report that I find a bit bizarre and even meaningless in its ability to compute an overall power, all across studies and researchers and even fields. Even in a single study, the alternative to “no effect” is composite, hence has a power that depends on the unknown value of the parameter. Seeking a single value for the power requires some prior distribution on the alternative.

“Peer review’s multiple failings would matter less if science’s self-correction mechanism—replication—was in working order.”

The next part of the paper covers the failings of peer review, of which I discussed in the ISBA Bulletin, but it seems to me too easy to blame the ref in failing to spot statistical or experimental errors, when lacking access to the data or to the full experimental methodology and when under pressure to return (for free) a report within a short time window. The best that can be expected is that a referee detects the implausibility of a claim or an obvious methodological or statistical mistake. These are not math papers! And, as pointed out repeatedly, not all referees are statistically numerate….

“Budding scientists must be taught technical skills, including statistics.”

The last part discusses of possible solutions to achieve reproducibility and hence higher confidence in experimental results. Paying for independent replication is the proposed solution but it can obviously only apply to a small margin of all published results. And having control bodies testing at random labs and teams following a major publication seems rather unrealistic, if only for filling the teams of such bodies with able controllers… An interesting if pessimistic debate, in fine. And fit for the International Year of Statistics.

Numbers rule your world

Posted in Books, Statistics with tags , , , , , , , , , , , on February 22, 2010 by xi'an

Andrew Gelman gave me a copy of the recent book Numbers rule your world by Kaiser Fung, along with the comment that it was a nice book but not for us. I spend my “lazy Sunday” morning reading the book at the breakfast table and agree with Andrew on his assessment. (waiting for the  incoming blog review!). Numbers rule your world is unlikely to bring enlightment to professional or academic statisticians, but it provides a nice and soft introduction to the use of statistics in everyday’s life, to the point I would encourage my second and third year students to read it. It covers a few topics that are central to Statistics via ten newspaper-ised stories that make for a very light read, but nonetheless make the point. The themes in Numbers rule your world are

  • variability matters more than average, as illustrated by queuing phenomena;
  • correlation is not causation, but is often good enough to uncover patterns, as illustrated by epidemiology and credit scoring;
  • Simpson’s paradox explains for apparent bias in group differences, as illustrated by SAT score differences between black students and white students;
  • false positives and false negatives have different impacts on the error (here comes Bayes theorem!), depending on population sizes and settings, as illustrated by the (great!) case of cheating athletes and polygraph tests (with a reference to Steve Fienberg‘s work);
  • extreme events may exhibit causes, or not, as illustrated by a cheating lottery case (involving Jeff Rosenthal as the expert, not the cheater!) and a series of air crashes.

The overall tone of Numbers rule your world is pleasant and engaging, at the other end of the stylistic spectrum from Taleb’s Black Swan. Fung’s point is obviously the opposite of Taleb‘s: he is showing the reader how well statistical modelling can explain for apparently paradoxical behaviour. Fung is also adopting a very neutral tone, again a major change from Taleb, maybe being even too positive (no the only mention is made of the current housing crisis in the pages Numbers rule your world dedicates to credit scoring comes in the conclusion, pp. 176-7). Now, in terms of novelty, I cannot judge of the amount of innovation when compared with (numerous) other popular science books on the topic. For instance, I think Jeff Rosenthal’s Struck by Lightning brings a rather deeper perspective, but maybe thus restricts the readership further…

Re-Read RSS Paper

Posted in Statistics, University life with tags , , , , on August 28, 2009 by xi'an

In a kind of apologetic twist, the Research Section of the Royal Statistical Society has decided to have papers discussed that were already published as ordinary papers in JRSS Series B, because of their observed impact on the field. I think it is a terrific idea and I am looking forward the resulting discussions, since, given people have had much more time to think about and to implement the proposed methodology, their discussions should be deeper and more definitive than discussions about new methodologies. (The danger being an implosion of the discussion topic into summaries of independent papers!) The first paper selected for this type of discussion is Benjamin-Hochberg’s ‘Controlling the false discovery rate: a practical and powerful approach to multiple testing’ that has had an long-term impact on the way multiple testing is handled. The discussion will occur during the Society’s 175th anniversary conference in Edinburgh on September 9th.

The next Read Paper will take place on October 14 in London and is about `Particle Markov chain Monte Carlo’ by Christophe Andrieu, Arnaud Doucet, and Roman Holenstein, which is a novel approach to the construction of Markov kernels via sequential Monte Carlo methods. The paper should soon be available on the RSS website and anyone is welcome to submit written comments on the paper by October 28; if submitted before October 14, they may even be read during the meeting. Those contributions must be no longer than 400 words (plus figures) and should be submitted to Charlotte Stovell. If things proceed as last year, there should be in addition a preordinary meeting preceding the regular meeting in order to better explore the paper being discussed, organised by the Young Statisticians section of the RSS.