## not an ASA’s statement on p-values

Posted in Books, Kids, Statistics, University life with tags , , , , on March 18, 2016 by xi'an

This may be a coincidence, but a few days after the ASA statement got published, Yuri Gurevich and Vladimir Vovk arXived a note on the Fundamentals of p-values. Which actually does not contribute to the debate. The paper is written in a Q&A manner. And defines a sort of peculiar logic related with [some] p-values. A second and more general paper is in the making, which may shed more light on the potential appeal of this formalism…

## ASA’s statement on p-values [#2]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on March 9, 2016 by xi'an

It took a visit on FiveThirtyEight to realise the ASA statement I mentioned yesterday was followed by individual entries from most members of the panel, much more diverse and deeper than the statement itself! Without discussing each and all comments, some points I subscribe to

• it does not make sense to try to replace the p-value and the 5% boundary by something else but of the same nature. This was the main line of our criticism of Valen Johnson’s PNAS paper with Andrew.
• it does not either make sense to try to come up with a hard set answer about whether or not a certain parameter satisfies a certain constraint. A comparison of predictive performances at or around the observed data sounds much more sensible, if less definitive.
• the Bayes factor is often advanced as a viable alternative to the p-value in those comments, but it suffers from difficulties exposed in our recent testing by mixture paper, one being the lack of absolute scale.
• we seem unable to escape the landscape set by Neyman and Pearson when constructing their testing formalism, including the highly unrealistic 0-1 loss function. And the grossly asymmetric opposition between null and alternative hypotheses.
• the behaviour of any procedure of choice should be evaluated under different scenarios, most likely by simulation, including some accounting for misspecified models. Which may require an extra bit of non-parametrics. And we should abstain from considering further than evaluating whether or not the data looks compatible with each of the scenarios. Or how much through the mixture representation.

## ASA’s statement on p-values

Posted in Books, Statistics, University life with tags , , , , , on March 8, 2016 by xi'an

Last night I received an email from the ASA signed by Jessica Utts and Ron Wasserstein with the following sentence

“Widespread use of ‘statistical significance’ (generally interpreted as ‘p

In short, we envision a new era, in which the broad scientific community recognizes what statisticians have been advocating for many years. In this “post p

Is such an era beyond reach? We think not, but we need your help in making sure this opportunity is not lost.”

which is obviously missing important bits. The email was pointing out a free access American Statistician article warning about the misuses and over-interpretations of p-values. Which contains rather basic “principles” that p-values are not probabilities that the null is true, that there is no golden level against which to compare the p-value, that nominal p-values may be far from actual p-values, that they do not provide a measure of evidence per se, &tc. As written in the conclusion, “Nothing in the ASA statement is new”. But, besides calling for caution and the cumulative use of different assessments of evidence, this statement may leave the non-statistician completely nonplussed about how to proceed when testing hypotheses or comparing models. And make the decision of Basic and Applied Social Psychology of rejecting all arguments based on p-values sound sensible.

Incidentally, the article contains the completion of the first sentence [in red below], if not of the second:

“Widespread use of ‘statistical significance’ (generally interpreted as ‘p≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.

## should I run less?!

Posted in Running, Statistics with tags , , , on February 10, 2015 by xi'an

A study [re]published three days ago in both The New York Times and the BBC The Guardian reproduced the conclusion of an article in the Journal of the American College of Cardiology that strenuous and long-distance jogging (or more appropriately running) could have a negative impact on longevity! And that the best pace is around 8km/h, just above a brisk walk! Quite depressing… However, this was quickly followed by other articles, including this one in The New York Times, pointing out the lack of statistical validation in the study and the ridiculously small number of runners in the study. I am already feeling  better (and ready for my long run tomorrow morning!), but appalled all the same by the lack of standards of journals publishing statistically void studies. I know, nothing new there…

## independent component analysis and p-values

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 8, 2014 by xi'an

Last morning at the neuroscience workshop Jean-François Cardoso presented independent component analysis though a highly pedagogical and enjoyable tutorial that stressed the geometric meaning of the approach, summarised by the notion that the (ICA) decomposition

$X=AS$

of the data X seeks both independence between the columns of S and non-Gaussianity. That is, getting as away from Gaussianity as possible. The geometric bits came from looking at the Kullback-Leibler decomposition of the log likelihood

$-\mathbb{E}[\log L(\theta|X)] = KL(P,Q_\theta) + \mathfrak{E}(P)$

where the expectation is computed under the true distribution P of the data X. And Qθ is the hypothesised distribution. A fine property of this decomposition is a statistical version of Pythagoreas’ theorem, namely that when the family of Qθ‘s is an exponential family, the Kullback-Leibler distance decomposes into

$KL(P,Q_\theta) = KL(P,Q_{\theta^0}) + KL(Q_{\theta^0},Q_\theta)$

where θ⁰ is the expected maximum likelihood estimator of θ. (We also noticed this possibility of a decomposition in our Kullback-projection variable-selection paper with Jérôme Dupuis.) The talk by Aapo Hyvärinen this morning was related to Jean-François’ in that it used ICA all the way to a three-level representation if oriented towards natural vision modelling in connection with his book and the paper on unormalised models recently discussed on the ‘Og.

On the afternoon, Eric-Jan Wagenmaker [who persistently and rationally fight the (ab)use of p-values and who frequently figures on Andrew’s blog] gave a warning tutorial talk about the dangers of trusting p-values and going fishing for significance in existing studies, much in the spirit of Andrew’s blog (except for the defence of Bayes factors). Arguing in favour of preregistration. The talk was full of illustrations from psychology. And included the line that ESP testing is the jester of academia, meaning that testing for whatever form of ESP should be encouraged as a way to check testing procedures. If a procedure finds a significant departure from the null in this setting, there is something wrong with it! I was then reminded that Eric-Jan was one of the authors having analysed Bem’s controversial (!) paper on the “anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms”… (And of the shocking talk by Jessica Utts on the same topic I attended in Australia two years ago.)

## MCMSki IV [day 3]

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on January 9, 2014 by xi'an

Already on the final day..! And still this frustration in being unable to attend three sessions at once… Andrew Gelman started the day with a non-computational talk that broached on themes that are familiar to readers of his blog, on the misuse of significance tests and on recommendations for better practice. I then picked the Scaling and optimisation of MCMC algorithms session organised by Gareth Roberts, with optimal scaling talks by Tony Lelièvre, Alex Théry and Chris Sherlock, while Jochen Voss spoke about the convergence rate of ABC, a paper I already discussed on the blog. A fairly exciting session showing that MCMC’ory (name of a workshop I ran in Paris in the late 90’s!) is still well and alive!

After the break (sadly without the ski race!), the software round-table session was something I was looking for. The four softwares covered by this round-table were BUGS, JAGS, STAN, and BiiPS, each presented according to the same pattern. I would have like to see a “battle of the bands”, illustrating pros & cons for each language on a couple of models & datasets. STAN got the officious prize for cool tee-shirts (we should have asked the STAN team for poster prize tee-shirts). And I had to skip the final session for a flu-related doctor appointment…

I called for a BayesComp meeting at 7:30, hoping for current and future members to show up and discuss the format of the future MCMski meetings, maybe even proposing new locations on other “sides of the Italian Alps”! But (workshop fatigue syndrome?!), no-one showed up. So anyone interested in discussing this issue is welcome to contact me or David van Dyk, the new BayesComp program chair.

## Statistical evidence for revised standards

Posted in Statistics, University life with tags , , , , , , , , , on December 30, 2013 by xi'an

In yet another permutation of the original title (!), Andrew Gelman posted the answer Val Johnson sent him after our (submitted)  letter to PNAS. As Val did not send me a copy (although Andrew did!), I will not reproduce it here and I rather refer the interested readers to Andrews’ blog… In addition to Andrew’s (sensible) points, here are a few idle (post-X’mas and pre-skiing) reflections:

• “evidence against a false null hypothesis accrues exponentially fast” makes me wonder in which metric this exponential rate (in γ?) occurs;
• that “most decision-theoretic analyses of the optimal threshold to use for declaring a significant finding would lead to evidence thresholds that are substantially greater than 5 (and probably also greater 25)” is difficult to accept as an argument since there is no trace of a decision-theoretic argument in the whole paper;
• Val rejects our minimaxity argument on the basis that “[UMPBTs] do not involve minimization of maximum loss” but the prior that corresponds to those tests is minimising the integrated probability of not rejecting at threshold level γ, a loss function integrated against parameter and observation, a Bayes risk in other words… Point masses or spike priors are clearly characteristics of minimax priors. Furthermore, the additional argument that “in most applications, however, a unique loss function/prior distribution combination does not exist” has been used by many to refute the Bayesian perspective and makes me wonder what are the arguments left in using a (pseudo-)Bayesian approach;
• the next paragraph is pure tautology: the fact that “no other test, based on either a subjectively or objectively specified alternative hypothesis, is as likely to produce a Bayes factor that exceeds the specified evidence threshold” is a paraphrase of the definition of UMPBTs, not an argument. I do not see we should solely “worry about false negatives”, since minimising those should lead to a point mass on the null (or, more seriously, should not lead to the minimax-like selection of the prior under the alternative).