## evolution of correlations [award paper]

Posted in Books, pictures, Statistics, University life with tags , , , , , on September 15, 2015 by xi'an

“Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples.”

On the statsblog aggregator, I spotted an entry that eventually led me to this post about the best paper award for the evolution of correlation, a paper published in the Journal of Research in Personality. A journal not particularly well-known for its statistical methodology input. The main message of the paper is that, while the empirical correlation is highly varying for small n’s, an interval (or corridor of stability!) can be constructed so that a Z-transform of the correlation does not vary away from the true value by more than a chosen quantity like 0.1. And the point of stability is then defined as the sample size after which the trajectory of the estimate does not leave the corridor… Both corridor and point depending on the true and unknown value of the correlation parameter by the by. Which implies resorting to bootstrap to assess the distribution of this point of stability. And deduce quantiles that can be used for… For what exactly?! Setting the necessary sample size? But this requires a preliminary run to assess the possible value of the true correlation ρ. The paper concludes that “for typical research scenarios reasonable trade-offs between accuracy and confidence start to be achieved when n approaches 250”. This figure was achieved by a bootstrap study on a bivariate Gaussian population with 10⁶ datapoints, yes indeed 10⁶!, and bootstrap samples of maximal size 10³. All in all, while I am at a loss as to why the Journal of Research in Personality would promote the estimation of a correlation coefficient with 250 datapoints, there is nothing fundamentally wrong with the paper (!), except for this recommendation of the 250 datapoint, as the figure stems from a specific setting with particular calibrations and cannot be expected to apply in every and all cases.

Actually, the graph in the paper was the thing that first attracted my attention because it looks very much like the bootstrap examples I show my third year students to demonstrate the appeal of bootstrap. Which is not particularly useful in the current case. A quick simulation on 100 samples of size 300 showed [above] that Monte Carlo simulations produce a tighter confidence band than the one created by bootstrap, in the Gaussian case. Continue reading

## testing via credible sets

Posted in Statistics, University life with tags , , , , , , , , , , , on October 8, 2012 by xi'an

Måns Thulin released today an arXiv document on some decision-theoretic justifications for [running] Bayesian hypothesis testing through credible sets. His main point is that using the unnatural prior setting mass on a point-null hypothesis can be avoided by rejecting the null when the point-null value of the parameter does not belong to the credible interval and that this decision procedure can be validated through the use of special loss functions. While I stress to my students that point-null hypotheses are very unnatural and should be avoided at all cost, and also that constructing a confidence interval is not the same as designing a test—the former assess the precision in the estimation, while the later opposes two different and even incompatible models—, let us consider Måns’ arguments for their own sake.

The idea of the paper is that there exist loss functions for testing point-null hypotheses that lead to HPD, symmetric and one-sided intervals as acceptance regions, depending on the loss func. This was already found in Pereira & Stern (1999). The issue with these loss functions is that they involve the corresponding credible sets in their definition, hence are somehow tautological. For instance, when considering the HPD set and T(x) as the largest HPD set not containing the point-null value of the parameter, the corresponding loss function is

$L(\theta,\varphi,x) = \begin{cases}a\mathbb{I}_{T(x)^c}(\theta) &\text{when }\varphi=0\\ b+c\mathbb{I}_{T(x)}(\theta) &\text{when }\varphi=1\end{cases}$

parameterised by a,b,c. And depending on the HPD region.

Måns then introduces new loss functions that do not depend on x and still lead to either the symmetric or the one-sided credible intervals.as acceptance regions. However, one test actually has two different alternatives (Theorem 2), which makes it essentially a composition of two one-sided tests, while the other test returns the result to a one-sided test (Theorem 3), so even at this face-value level, I do not find the result that convincing. (For the one-sided test, George Casella and Roger Berger (1986) established links between Bayesian posterior probabilities and frequentist p-values.) Both Theorem 3 and the last result of the paper (Theorem 4) use a generic and set-free observation-free loss function (related to eqn. (5.2.1) in my book!, as quoted by the paper) but (and this is a big but) they only hold for prior distributions setting (prior) mass on both the null and the alternative. Otherwise, the solution is to always reject the hypothesis with the zero probability… This is actually an interesting argument on the why-are-credible-sets-unsuitable-for-testing debate, as it cannot bypass the introduction of a prior mass on Θ0!

Overall, I furthermore consider that a decision-theoretic approach to testing should encompass future steps rather than focussing on the reply to the (admittedly dumb) question is θ zero? Therefore, it must have both plan A and plan B at the ready, which means preparing (and using!) prior distributions under both hypotheses. Even on point-null hypotheses.

Now, after I wrote the above, I came upon a Stack Exchange page initiated by Måns last July. This is presumably not the first time a paper stems from Stack Exchange, but this is a fairly interesting outcome: thanks to the debate on his question, Måns managed to get a coherent manuscript written. Great! (In a sense, this reminded me of the polymath experiments of Terry Tao, Timothy Gower and others. Meaning that maybe most contributors could have become coauthors to the paper!)