## F(1-F)

Posted in Books, Kids, Statistics with tags , , , on March 9, 2022 by xi'an

When answering an X validated question about the covariance between a random variable X and its cdf transform F(X), I realised that it was half the integral of the function

x → F(x)(1-F(x))

when X is centred. It is not surprising in the least to see the cdf appearing for this second order expectation, since it can similarly be used to represent first order expectations (as exploited by nested sampling). But it is easy to be confused by the fact that F(X) is usually a Uniform (0,1) variate hence distribution-free, until one sees it remains positively correlated with X, or by the apparent lack of scale or by the symmetry, until one realises this is not the case. (The associated correlation is scale-free.)

## a meaningful divide?

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , on August 16, 2021 by xi'an

Le Monde published this map in its 26 July edition, to illustrate the contrast between South-East and North and West France(s). Meaning that the North-West upper part is more vaccinated than the South-East lower part of the map. The figure being computed as the sum of the differences between local and national rates, per age group, weighted by the group sizes. The paper goes on analysing the divide in terms of sociology of the territories, as well as political opposition to Président Macron… But I wonder (over breakfast) if it does not see too much in this picture. First some districts have to be either above or below the national average. Second, the map does not incorporate the population density: very sparsely populated districts in the South-East, like Auvergne or central Corsica are more visible than the densest areas like the Greater Paris, while being more prone to low vaccination rates due to the larger distance to vaccination centres. Third, most of the districts are within ±15% of the average, which may be too large for statistical variation but not much! The geographer Emmanuel Vigneron points out in the paper an inverse correlation between vaccination and earlier COVID cases, but this is not so surprising in that people who have already been exposed to the virus may conclude they are well (enough) protected. Further, the age effect is not eliminated by the contrast, in that areas with an older population are bound to get closer to the average, given that vaccination in the older groups started earlier and was more seen as a life-or-death issue. The soundest observation is rather in the opposition between urban districts where, despite an equivalent access to vaccination opportunities, the poorer burbs like the Northern districts of Marseille being the least vaccinated (with possibly an age effect?).

## when perfect correlation just means… perfect!

Posted in Statistics with tags , , on February 6, 2018 by xi'an

When looking at an X validated question on generating two perfectly negatively correlated Bernoulli variates last week, my intuition was that one had to be the opposite of the other, which means their parameters had to sum up to one. Intuition that was plain easy to back up by solving the equation

corr(C¹,C²)=-1

in terms of the joint distribution of (C¹,C²). That perfect correlation implies strong constraints on the parameter of the Bernoulli is not highly surprising given its binary support. Although I had no time to pursue the issue, I idly wondered at the generalisation to, say, a Binomial case, i.e., whether or not this case still is the only possible one for the above to hold. But again a perfect correlation can only occur with perfect prediction, i.e., when the Binomial variates have the same number of trials and complementary probability parameters. (Of no particular relevance is the fact that the originator of the question preferred an answer that showed how to simulate two Bernoulli such that C¹+C²=1!)

## correlation matrices on copulas

Posted in R, Statistics, University life with tags , , , , on July 4, 2016 by xi'an

Following my post of yesterday about the missing condition in Lynch’s R code, Gérard Letac sent me a paper he recently wrote with Luc Devroye on correlation matrices and copulas. Paper written for the memorial volume in honour of Marc Yor. It considers the neat problem of the existence of a copula (on [0,1]x…x[0,1]) associated with a given correlation matrix R. And establishes this existence up to dimension n=9. The proof is based on the consideration of the extreme points of the set of correlation matrices. The authors conjecture the existence of (10,10) correlation matrices that cannot be a correlation matrix for a copula. The paper also contains a result that answers my (idle) puzzling of many years, namely on how to set the correlation matrix of a Gaussian copula to achieve a given correlation matrix R for the copula. More precisely, the paper links the [correlation] matrix R of X~N(0,R) with the [correlation] matrix R⁰ of Φ(X) by

$r^0_{ij}=\frac{6}{\pi}\arcsin\{r_{ij}/2\}$

A side consequence of this result is that there exist correlation matrices of copulas that cannot be associated with Gaussian copulas. Like

$R=\left[\begin{matrix} 1 &-1/2 &-1/2\\-1/2 &1 &-1/2\\-1/2 &-1/2 &1 \end{matrix}\right]$

## another wrong entry

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on June 27, 2016 by xi'an

Quite a coincidence! I just came across another bug in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Already discussed here and on X validated. While working with one participant to the post-ISBA softshop, we were looking for efficient approaches to simulating correlation matrices and came [by Google] across the above R code associated with a 3×3 correlation matrix, which misses the additional constraint that the determinant must be positive. As shown e.g. by the example

> eigen(matrix(c(1,-.8,.7,-.8,1,.6,.7,.6,1),ncol=3))
\$values
[1] 1.8169834 1.5861960 -0.4031794


having all correlations between -1 and 1 is not enough. Just. Not. Enough.