**W**hen checking the Python t distribution random generator, np.random.standard_t(), I came upon this manual page, which actually does not explain how the random generator works but spends instead the whole page to recall Gosset’s *t* test, illustrating its use on an energy intake of 11 women, but ending up misleading the readers by interpreting a .009 one-sided p-value as meaning “the null hypothesis [on the hypothesised mean] has a probability of about 99% of being true”! Actually, Python’s standard deviation estimator x.std() further returns by default a non-standard standard deviation, dividing by n rather than n-1…

## Archive for p-value

## a null hypothesis with a 99% probability to be true…

Posted in Books, R, Statistics, University life with tags cross validated, hypothesis testing, np.random.standard_t, null hypothesis, numpy, p-value, pseudo-random generator, Python, R, t distribution, W. Gosset, x.std on March 28, 2018 by xi'an## how many academics does it take to change… a p-value threshold?

Posted in Books, pictures, Running, Statistics, Travel with tags .005, Bayes factor, Bayesian p-values, Nature, Nature Human Behaviour, p-value, PNAS, PsyArXiv, Valen Johnson on August 22, 2017 by xi'an

“…a critical mass of researchers now endorse this change.”

**T**he answer to the lightpulp question seems to be 72: Andrew sent me a short paper recently PsyarXived and to appear in *Nature Human Behaviour* following on the .005 not .05 tune we criticised in PNAS a while ago. (Actually a very short paper once the names and affiliations of all authors are taken away.) With indeed 72 authors, many of them my Bayesian friends! I figure the mass signature is aimed at convincing users of p-values of a consensus among statisticians. Or a “critical mass” as stated in the note. On the next week, Nature had an entry on this proposal. (With a survey on whether the p-value threshold should change!)

The argument therein [and hence my reservations] is about the same as in Val Johnson’s original PNAS paper, namely that .005 should become the reference cutoff when using p-values for discovering new effects. The tone of the note is mostly Bayesian in that it defends the Bayes factor as a better alternative I would call the b-value. And produces graphs that relate p-values to some minimax Bayes factors. In the simplest possible case of testing for the nullity of a normal mean. Which I do not think is particularly convincing when considering more realistic settings with (many) nuisance parameters and possible latent variables where numerical answers diverge between p-values and [an infinity of] b-values. And of course the unsolved issue of scaling the Bayes factor. (This without embarking anew upon a full-fledged criticism of the Bayes factor.) As usual, I am also skeptical of mentions of power, since I never truly understood the point of power, which depends on the alternative model, increasingly so with the complexity of this alternative. As argued in our letter to PNAS, the central issue that this proposal fails to address is the urgency in abandoning the notion [indoctrinated in generations of students that a single quantity and a single bound are the answers to testing issues. Changing the bound sounds like suggesting to paint afresh a building on the verge of collapsing.

## Lindley’s paradox as a loss of resolution

Posted in Books, pictures, Statistics with tags AIC, Bayesian inference, Bayesian tests of hypotheses, BIC, Cross Validation, Inferential Models, Jeffreys-Lindley paradox, p-value, pseudo-Bayes factors on November 9, 2016 by xi'an

“The principle of indifference states that in the absence of prior information, all mutually exclusive models should be assigned equal prior probability.”

**C**olin LaMont and Paul Wiggins arxived a paper on Lindley’s paradox a few days ago. The above quote is the (standard) argument for picking (½,½) partition between the two hypotheses, which I object to if only because it does not stand for multiple embedded models. The main point in the paper is to argue about the loss of resolution induced by averaging against the prior, as illustrated by the picture above for the N(0,1) versus N(μ,1) toy problem. What they call resolution is the lowest possible mean estimate for which the null is rejected by the Bayes factor (assuming a rejection for Bayes factors larger than 1). While the detail is missing, I presume the different curves on the lower panel correspond to different choices of L when using U(-L,L) priors on μ… The “Bayesian rejoinder” to the Lindley-Bartlett paradox (p.4) is in tune with my interpretation, namely that as the prior mass under the alternative gets more and more spread out, there is less and less prior support for reasonable values of the parameter, hence a growing tendency to accept the null. This is an illustration of the long-lasting impact of the prior on the posterior probability of the model, because the data cannot impact the tails very much.

“If the true prior is known, Bayesian inference using the true prior is optimal.”

This sentence and the arguments following is meaningless in my opinion as knowing the “true” prior makes the Bayesian debate superfluous. If there was a unique, Nature provided, known prior π, it would loose its original meaning to become part of the (frequentist) model. The argument is actually mostly used in negative, namely that since it is not know we should not follow a Bayesian approach: this is, e.g., the main criticism in Inferential Models. But there is no such thing as a “true” prior! (Or a “true’ model, all things considered!) In the current paper, this pseudo-natural approach to priors is utilised to justify a return to the pseudo-Bayes factors of the 1990’s, when one part of the data is used to stabilise and proper-ise the (improper) prior, and a second part to run the test *per se*. This includes an interesting insight on the limiting cases of partitioning corresponding to AIC and BIC, respectively, that I had not seen before. With the surprising conclusion that “AIC is the derivative of BIC”!

## Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags ABC, Bayes factor, CHANCE, CRC Press, discrepancies, Error and Inference, improper prior, integrated likelihood, Jeffreys-Lindley paradox, Likelihood Principle, marginalisation paradoxes, model checking, model validation, Monty Hall problem, Murray Aitkin, p-value, point null hypotheses, relative belief ratio, University of Toronto on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)

**T**his new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it).* [Warning: this is a rather long review and post, so readers may chose to opt out now!]*

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

## a refutation of Johnson’s PNAS paper

Posted in Books, Statistics, University life with tags Alice and Bob, Bayes factor, Bayesian statistics, Bayesian tests, hypothesis testing, p-value, Valen Johnson, xkcd on February 11, 2014 by xi'an**J**ean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)

“We do not discuss the validity of this[Bayesian]hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”

**T**he refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk. Continue reading

## Truly random [again]

Posted in Books, R, Statistics, University life with tags Bell's theorem, La Recherche, Nature, p-value, PCP theorem, quantum mechanics, realism, statistical tests, Stephen Hawking on December 10, 2010 by xi'an

“The measurement outputs contain at the 99% confidence level 42 new random bits. This is a much stronger statement than passing or not passing statistical tests, which merely indicate that no obvious non-random patterns are present.”arXiv:0911.3427

**A**s often, I bought ** La Recherche** in the station newsagent for the wrong reason! The cover of the December issue was about “God and Science” and I thought this issue would bring some interesting and deep arguments in connection with my math and realism post. The debate is very short, does not go in any depth. reproduces the Hawking’s quote that started the earlier post, and recycles the same graph about cosmology I used last summer in Vancouver! However, there are alternative interesting entries about probabilistic proof checking in Mathematics and truly random numbers… The first part is on an ACM paper on the PCP theorem by Irit Dinur, but is too terse as is (while the theory behind presumably escapes my abilities!). The second part is about a paper in

**published by Pironio et al. and arXived as well. It is entitled**

*Nature**“Random numbers certified by Bell’s Theorem”*and also is one of the laureates of the

**prize this year. I was first annoyed by the French coverage of the paper, mentioning that “a number was random with a probability of 99%” (?!) and that “a sequence of numbers is perfectly random” (re-?!). The original paper is however stating the same thing, hence stressing the different meaning associated to randomness by those physicists, “the unpredictable character of the outcomes” and “universally-composable security”. The above “probability of randomness” is actually a p-value (associated with the null hypothesis that Bell’s inequality is not violated) that is equal to 0.00077. (So the above quote is somehow paradoxical!) The huge apparatus used to produce those random events is not very efficient: on average, 7 binary random numbers are detected per hour… A far cry from the “truly random” generator produced by Intel!**

*La Recherche***Ps-A**s a concidence, Julien Cornebise pointed out to me that there is a supplement in the journal about ** “Le Savoir du Corps”** which is in fact handled by the pharmaceutical company Servier, currently under investigation for its drug Mediator… A very annoying breach of basic journalistic ethics in my opinion!

## Random sudokus [p-values]

Posted in R, Statistics with tags combinatorics, entropy, Kullback, Monte Carlo, p-value, simulation, sudoku, uniformity on May 21, 2010 by xi'an**I** reran the program checking the distribution of the digits over 9 “diagonals” (obtained by acceptable permutations of rows and column) and this test again results in mostly small p-values. Over a million iterations, and the nine (dependent) diagonals, four p-values were below 0.01, three were below 0.1, and two were above (0.21 and 0.42). So I conclude in a discrepancy between my (full) sudoku generator and the hypothesised distribution of the (number of different) digits over the diagonal. Assuming my generator is a faithful reproduction of the one used in the paper by Newton and DeSalvo, this discrepancy suggests that their distribution over the sudoku grids do not agree with this diagonal distribution, either because it is actually different from uniform or, more likely, because the uniform distribution I use over the (groups of three over the) diagonal is not compatible with a uniform distribution over all sudokus…