Archive for p-values

more air for MCMC

Posted in Books, R, Statistics with tags , , , , , , , , , , , , , , on May 30, 2021 by xi'an

Aki Vehtari, Andrew Gelman, Dan Simpson, Bob Carpenter, and Paul-Christian Bürkner have just published a Bayesian Analysis paper about using an improved R factor for MCMC convergence assessment. From the early days of MCMC, convergence assessment has been a recurring (and recurrent!) question in the community. First leading to a flurry of proposals, [which Kerrie, Chantal, and myself reviewwwed in the Valencia 1998 proceedings], and then slowly disintegrating under the onslaughts of reality—i.e. that none could not be 100% foolproof in full generality—…. This included the (possibly now forgotten) single-versus-multiple-chains debate between Charlie Geyer [for single] and Andrew Gelman and Don Rubin [for multiple]. The later introduced an analysis-of-variance R factor, which remains quite popular up to this day, in part for being part of most MCMC software, like BUGS. That this R may fail to identify convergence issues, even in the more recent split version, does not come as a major surprise, since any situation with a long-term influence of the starting distribution may well fail to identify missing (significant) parts of the posterior support. (It is thus somewhat disconcerting to me to see that the main recommendation is to move the bound on R from 1.1 to 1.01, reminding me to some extent of a recent proposal to move the null rejection boundary from 0.05 to 0.005…) Similarly, the ESS may prove a poor signal for convergence or lack thereof, especially because the approximation of the asymptotic variance relies on stationarity assumptions. While multiplying the monitoring tools (as in CODA) helps with identifying convergence issues, looking at a single convergence indicator is somewhat like looking only at a frequentist estimator! (And with greater automation comes greater responsibility—in keeping a critical perspective.)

Looking for a broader perspective, I thus wonder at what we would instead need to assess the lack of convergence of an MCMC chain without much massaging of the said chain. An evaluation of the (Kullback, Wasserstein, or else) distance between the distribution of the chain at iteration n or across iterations, and the true target? A percentage of the mass of the posterior visited so far, which relates to estimating the normalising constant, with a relatively vast array of solutions made available in the recent years? I remain perplexed and frustrated by the fact that, 30 years later, the computed values of the visited likelihoods are not better exploited. Through for instance machine-learning approximations of the target. that could themselves be utilised for approximating the normalising constant and potential divergences from other approximations.

abandoned, one year ago…

Posted in Books, Statistics, University life with tags , , , , on March 17, 2020 by xi'an

Nature tidbits [the Bayesian brain]

Posted in Statistics with tags , , , , , , , , , , , , , , on March 8, 2020 by xi'an

In the latest Nature issue, a long cover of Asimov’s contributions to science and rationality. And a five page article on the dopamine reward in the brain seen as a probability distribution, seen as distributional reinforcement learning by researchers from DeepMind, UCL, and Harvard. Going as far as “testing” for this theory with a p-value of 0.008..! Which could be as well a signal of variability between neurons to dopamine rewards (with a p-value of 10⁻¹⁴, whatever that means). Another article about deep learning about protein (3D) structure prediction. And another one about learning neural networks via specially designed devices called memristors. And yet another one on West Africa population genetics based on four individuals from the Stone to Metal age (8000 and 3000 years ago), SNPs, PCA, and admixtures. With no ABC mentioned (I no longer have access to the journal, having missed renewal time for my subscription!). And the literal plague of a locust invasion in Eastern Africa. Making me wonder anew as to why proteins could not be recovered from the swarms of locust to partly compensate for the damages. (Locusts eat their bodyweight in food every day.) And the latest news from NeurIPS about diversity and inclusion. And ethics, as in checking for responsibility and societal consequences of research papers. Reviewing the maths of a submitted paper or the reproducibility of an experiment is already challenging at times, but evaluating the biases in massive proprietary datasets or the long-term societal impact of a classification algorithm may prove beyond the realistic.

p-values, Bayes factors, and sufficiency

Posted in Books, pictures, Statistics with tags , , , , , , , , , on April 15, 2019 by xi'an

Among the many papers published in this special issue of TAS on statistical significance or lack thereof, there is a paper I had already read before (besides ours!), namely the paper by Jonty Rougier (U of Bristol, hence the picture) on connecting p-values, likelihood ratio, and Bayes factors. Jonty starts from the notion that the p-value is induced by a transform, summary, statistic of the sample, t(x), the larger this t(x), the less likely the null hypothesis, with density f⁰(x), to create an embedding model by exponential tilting, namely the exponential family with dominating measure f⁰, and natural statistic, t(x), and a positive parameter θ. In this embedding model, a Bayes factor can be derived from any prior on θ and the p-value satisfies an interesting double inequality, namely that it is less than the likelihood ratio, itself lower than any (other) Bayes factor. One novel aspect from my perspective is that I had thought up to now that this inequality only holds for one-dimensional problems, but there is no constraint here on the dimension of the data x. A remark I presumably made to Jonty on the first version of the paper is that the p-value itself remains invariant under a bijective increasing transform of the summary t(.). This means that there exists an infinity of such embedding families and that the bound remains true over all such families, although the value of this minimum is beyond my reach (could it be the p-value itself?!). This point is also clear in the justification of the analysis thanks to the Pitman-Koopman lemma. Another remark is that the perspective can be inverted in a more realistic setting when a genuine alternative model M¹ is considered and a genuine likelihood ratio is available. In that case the Bayes factor remains smaller than the likelihood ratio, itself larger than the p-value induced by the likelihood ratio statistic. Or its log. The induced embedded exponential tilting is then a geometric mixture of the null and of the locally optimal member of the alternative. I wonder if there is a parameterisation of this likelihood ratio into a p-value that would turn it into a uniform variate (under the null). Presumably not. While the approach remains firmly entrenched within the realm of p-values and Bayes factors, this exploration of a natural embedding of the original p-value is definitely worth mentioning in a class on the topic! (One typo though, namely that the Bayes factor is mentioned to be lower than one, which is incorrect.)

abandon ship [value]!!!

Posted in Books, Statistics, University life with tags , , , , , , , , , on March 22, 2019 by xi'an

The Abandon Statistical Significance paper we wrote with Blakeley B. McShane, David Gal, Andrew Gelman, and Jennifer L. Tackett has now appeared in a special issue of The American Statistician, “Statistical Inference in the 21st Century: A World Beyond p < 0.05“.  A 400 page special issue with 43 papers available on-line and open-source! Food for thought likely to be discussed further here (and elsewhere). The paper and the ideas within have been discussed quite a lot on Andrew’s blog and I will not repeat them here, simply quoting from the conclusion of the paper

In this article, we have proposed to abandon statistical significance and offered recommendations for how this can be implemented in the scientific publication process as well as in statistical decision making more broadly. We reiterate that we have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors.

Which also introduced in a comment by Valentin Amrhein, Sander Greenland, and Blake McShane published in Nature today (and supported by 800+ signatures). Again discussed on Andrew’s blog.