Archive for correlation

a meaningful divide?

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , on August 16, 2021 by xi'an

Le Monde published this map in its 26 July edition, to illustrate the contrast between South-East and North and West France(s). Meaning that the North-West upper part is more vaccinated than the South-East lower part of the map. The figure being computed as the sum of the differences between local and national rates, per age group, weighted by the group sizes. The paper goes on analysing the divide in terms of sociology of the territories, as well as political opposition to Président Macron… But I wonder (over breakfast) if it does not see too much in this picture. First some districts have to be either above or below the national average. Second, the map does not incorporate the population density: very sparsely populated districts in the South-East, like Auvergne or central Corsica are more visible than the densest areas like the Greater Paris, while being more prone to low vaccination rates due to the larger distance to vaccination centres. Third, most of the districts are within ±15% of the average, which may be too large for statistical variation but not much! The geographer Emmanuel Vigneron points out in the paper an inverse correlation between vaccination and earlier COVID cases, but this is not so surprising in that people who have already been exposed to the virus may conclude they are well (enough) protected. Further, the age effect is not eliminated by the contrast, in that areas with an older population are bound to get closer to the average, given that vaccination in the older groups started earlier and was more seen as a life-or-death issue. The soundest observation is rather in the opposition between urban districts where, despite an equivalent access to vaccination opportunities, the poorer burbs like the Northern districts of Marseille being the least vaccinated (with possibly an age effect?).

when perfect correlation just means… perfect!

Posted in Statistics with tags , , on February 6, 2018 by xi'an

When looking at an X validated question on generating two perfectly negatively correlated Bernoulli variates last week, my intuition was that one had to be the opposite of the other, which means their parameters had to sum up to one. Intuition that was plain easy to back up by solving the equation

corr(C¹,C²)=-1

in terms of the joint distribution of (C¹,C²). That perfect correlation implies strong constraints on the parameter of the Bernoulli is not highly surprising given its binary support. Although I had no time to pursue the issue, I idly wondered at the generalisation to, say, a Binomial case, i.e., whether or not this case still is the only possible one for the above to hold. But again a perfect correlation can only occur with perfect prediction, i.e., when the Binomial variates have the same number of trials and complementary probability parameters. (Of no particular relevance is the fact that the originator of the question preferred an answer that showed how to simulate two Bernoulli such that C¹+C²=1!)

correlation matrices on copulas

Posted in R, Statistics, University life with tags , , , , on July 4, 2016 by xi'an

Following my post of yesterday about the missing condition in Lynch’s R code, Gérard Letac sent me a paper he recently wrote with Luc Devroye on correlation matrices and copulas. Paper written for the memorial volume in honour of Marc Yor. It considers the neat problem of the existence of a copula (on [0,1]x…x[0,1]) associated with a given correlation matrix R. And establishes this existence up to dimension n=9. The proof is based on the consideration of the extreme points of the set of correlation matrices. The authors conjecture the existence of (10,10) correlation matrices that cannot be a correlation matrix for a copula. The paper also contains a result that answers my (idle) puzzling of many years, namely on how to set the correlation matrix of a Gaussian copula to achieve a given correlation matrix R for the copula. More precisely, the paper links the [correlation] matrix R of X~N(0,R) with the [correlation] matrix R⁰ of Φ(X) by

r^0_{ij}=\frac{6}{\pi}\arcsin\{r_{ij}/2\}

A side consequence of this result is that there exist correlation matrices of copulas that cannot be associated with Gaussian copulas. Like

R=\left[\begin{matrix} 1 &-1/2 &-1/2\\-1/2 &1 &-1/2\\-1/2 &-1/2 &1 \end{matrix}\right]

another wrong entry

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on June 27, 2016 by xi'an

Quite a coincidence! I just came across another bug in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Already discussed here and on X validated. While working with one participant to the post-ISBA softshop, we were looking for efficient approaches to simulating correlation matrices and came [by Google] across the above R code associated with a 3×3 correlation matrix, which misses the additional constraint that the determinant must be positive. As shown e.g. by the example

> eigen(matrix(c(1,-.8,.7,-.8,1,.6,.7,.6,1),ncol=3))
$values
[1] 1.8169834 1.5861960 -0.4031794

having all correlations between -1 and 1 is not enough. Just. Not. Enough.

Gauss to Laplace transmutation interpreted

Posted in Books, Kids, Statistics, University life with tags , , , , , , on November 9, 2015 by xi'an

Following my earlier post [induced by browsing X validated], on the strange property that the product of a Normal variate by an Exponential variate is a Laplace variate, I got contacted by Peng Ding from UC Berkeley, who showed me how to derive the result by a mere algebraic transform, related with the decomposition

(X+Y)(X-Y)=X²-Y² ~ 2XY

when X,Y are iid Normal N(0,1). Peng Ding and Joseph Blitzstein have now arXived a note detailing this derivation, along with another derivation using the moment generating function. As a coincidence, I also came across another interesting representation on X validated, namely that, when X and Y are Normal N(0,1) variates with correlation ρ,

XY ~ R(cos(πU)+ρ)

with R Exponential and U Uniform (0,1). As shown by the OP of that question, it is a direct consequence of the decomposition of (X+Y)(X-Y) and of the polar or Box-Muller representation. This does not lead to a standard distribution of course, but remains a nice representation of the product of two Normals.

%d bloggers like this: