Archive for empirical correlation

re-revisiting Jeffreys

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on October 16, 2015 by xi'an

Amster12Analytic Posteriors for Pearson’s Correlation Coefficient was arXived yesterday by Alexander Ly , Maarten Marsman, and Eric-Jan Wagenmakers from Amsterdam, with whom I recently had two most enjoyable encounters (and dinners!). And whose paper on Jeffreys’ Theory of Probability I recently discussed in the Journal of Mathematical Psychology.

The paper re-analyses Bayesian inference on the Gaussian correlation coefficient, demonstrating that for standard reference priors the posterior moments are (surprisingly) available in closed form. Including priors suggested by Jeffreys (in a 1935 paper), Lindley, Bayarri (Susie’s first paper!), Berger, Bernardo, and Sun. They all are of the form

\pi(\theta)\propto(1+\rho^2)^\alpha(1-\rho^2)^\beta\sigma_1^\gamma\sigma_2^\delta

and the corresponding profile likelihood on ρ is in “closed” form (“closed” because it involves hypergeometric functions). And only depends on the sample correlation which is then marginally sufficient (although I do not like this notion!). The posterior moments associated with those priors can be expressed as series (of hypergeometric functions). While the paper is very technical, borrowing from the Bateman project and from Gradshteyn and Ryzhik, I like it if only because it reminds me of some early papers I wrote in the same vein, Abramowitz and Stegun being one of the very first books I bought (at a ridiculous price in the bookstore of Purdue University…).

Two comments about the paper: I see nowhere a condition for the posterior to be proper, although I assume it could be the n>1+γ−2α+δ constraint found in Corollary 2.1 (although I am surprised there is no condition on the coefficient β). The second thing is about the use of this analytic expression in simulations from the marginal posterior on ρ: Since the density is available, numerical integration is certainly more efficient than Monte Carlo integration [for quantities that are not already available in closed form]. Furthermore, in the general case when β is not zero, the cost of computing infinite series of hypergeometric and gamma functions maybe counterbalanced by a direct simulation of ρ and both variance parameters since the profile likelihood of this triplet is truly in closed form, see eqn (2.11). And I will not comment the fact that Fisher ends up being the most quoted author in the paper!

evolution of correlations [award paper]

Posted in Books, pictures, Statistics, University life with tags , , , , , on September 15, 2015 by xi'an

“Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples.”

On the statsblog aggregator, I spotted an entry that eventually led me to this post about the best paper award for the evolution of correlation, a paper published in the Journal of Research in Personality. A journal not particularly well-known for its statistical methodology input. The main message of the paper is that, while the empirical correlation is highly varying for small n’s, an interval (or corridor of stability!) can be constructed so that a Z-transform of the correlation does not vary away from the true value by more than a chosen quantity like 0.1. And the point of stability is then defined as the sample size after which the trajectory of the estimate does not leave the corridor… Both corridor and point depending on the true and unknown value of the correlation parameter by the by. Which implies resorting to bootstrap to assess the distribution of this point of stability. And deduce quantiles that can be used for… For what exactly?! Setting the necessary sample size? But this requires a preliminary run to assess the possible value of the true correlation ρ. The paper concludes that “for typical research scenarios reasonable trade-offs between accuracy and confidence start to be achieved when n approaches 250”. This figure was achieved by a bootstrap study on a bivariate Gaussian population with 10⁶ datapoints, yes indeed 10⁶!, and bootstrap samples of maximal size 10³. All in all, while I am at a loss as to why the Journal of Research in Personality would promote the estimation of a correlation coefficient with 250 datapoints, there is nothing fundamentally wrong with the paper (!), except for this recommendation of the 250 datapoint, as the figure stems from a specific setting with particular calibrations and cannot be expected to apply in every and all cases.

bespprActually, the graph in the paper was the thing that first attracted my attention because it looks very much like the bootstrap examples I show my third year students to demonstrate the appeal of bootstrap. Which is not particularly useful in the current case. A quick simulation on 100 samples of size 300 showed [above] that Monte Carlo simulations produce a tighter confidence band than the one created by bootstrap, in the Gaussian case. Continue reading

%d bloggers like this: