Archive for Spearman rank test

Correlations between the physical and social sciences

Posted in Books, Statistics, University life with tags , , , , , , , on January 18, 2012 by xi'an

This is probably the most bizarre book I have received for review (so far).  Its title is wide-ranging: Correlations Between the Physical and Social Sciences.  Its cover is enticing: a picture of the young Albert Einstein. Its purpose is wide:

The thesis of this monograph is that societies in general are governed by objective laws that have their roots in human nature. The task of the social scientist is to discover and explore those laws (…) Null hypotheses and alternative rival hypotheses developed by social scientists must eclectically correlated to mathematical formulae or the laws of physics in order to advance non-speculative, unbiased knowledge.” V.J. Belfiglio (p.x)

So the thesis advanced in Correlations Between the Physical and Social Sciences by Valentine Belfiglio is that social problems can be represented in terms of physical laws. The 41 pages book pushes this argument through four cases studies.

The first case study relates marital assimilation of minority groups into dominate core cultures with Graham’s Law for the diffusion of gases. The second case study relates the mutual hostility of political leaders with the Mirror Equation employed in basic geometric optics. The third case study relates the duration of major American military conflicts to the formulae for empirical and subjective probabilities. The fourth case study relates the radioactive decay formula for radioactive substances to the rate of decline of several extinct empires” V.J. Belfiglio (p.xi)

As the author himself recognises, “the four case studies in this monograph do not provide definitive answers.” My opinion is that they do not provide answers at all! Indeed, the first chapter contains two 2×2 tables about the endogamous preferences of Mexican and Italian inhabitants of Dallas, Texas. A chi-square test concludes that Mexicans prefer endogamy and that Italians do not. Although Graham’s Law is re-expressed there as “marital assimilation being inversely proportional to the square root of the population densities” (p.3), there is no result based on the data supporting this law. The second chapter is trying to “explore the mutuality of hostility between the Bush and Ahmandinejad (sic) administrations. Spearman’s Rho correlation coefficient” (p.11) is used and found to demonstrate “a perfect positive correlation” (p.12), although the data is quantitative (intensity of hostility between 1 and 9) and not paired. (The study simply shows that the empirical cdfs of the hostility values for both sides are approximately the same, Spearman’s rho test being inappropriate there.) The connection with optics is at best tenuous. Chapter 3 centres on a table for the durations of major American (meaning US) military conflicts. A mere observation is that the US “has been engaged in major wars 56.5 percent of the time between 1775-2010.” (p.24) but Valentine Belfiglio turns this into “empirical probability” (i.e the frequency of wars), a “subjective probability” (i.e. the average number of years of peace between wars), and the “number of possible interaction channels” (i.e. a combination number) as a way to link American foreign policy with probability theory. Again, the connection is non-existent. The fourth and final chapter is about the “correlation between the decay of radioactive substances and the rate of decline of empires.” (p.31) The data is made of the duration of seven empires, associated with estimates of their half-life. The paper concludes on “a perfect negative correlation between the half-lives of empires and their rates of decline” (p.35), which is not very surprising when considering that one is a monotonic function of the other…

I conclude with the words of Henry Wadsworth Longfellow: “Sometimes we may learn more from a man’s errors, than from his virtues”.” V.J. Belfiglio (p.40)

There is therefore not much to discuss about this book: it does not go beyond stating the obvious, while the connection between the observed social phenomena and generic physical laws remains at the level of a literary ellipse, not of a scientific demonstration. I am deeply puzzled at why a publisher would want to publish this… Any review of the material should have shown the author was out of his depth—his speciality at Texas Woman’s University is Government—in this particular endeavour of proving that “mathematical formulae and the law of physics can take scholars further in deriving conclusions from sets of assumptions than can inferential statistics” (back-cover).

Le Monde [reverse] rank test

Posted in Statistics with tags , , , , on April 13, 2010 by xi'an

This is the fourth and hopefuly last post about this puzzle. If I translate the problem proposed by Le Monde, it reads as follows

Twenty pupils in the class have different grades that are the integers from 1 to 20. The ten girls in the class are ordered from the best grade to the worst one, while the ten boys in the class are placed from the worst grade to the best one. The absolute differences between the pairs thus formed are computed and sum up. What is the range for this sum?

which is different from what I “read”, where both boys and girls were ranked in increasing order. Of course, “my” reading makes more sense (!) from a statistical point of view, because this defines a rank test for both samples having the same distribution. (The range is then between 10 and 100.) However, the solution to the original problem published in the weekend special edition is that the sum is always equal to 100. The argument is that any number less than 10 is paired with a number larger than 10, thus that the numbers larger than 10 get a positive sign, while the numbers less than 10 always get a negative factor, leading to

\sum_{i=1}^{10} (10+i) - \sum_{i=1}^{10} i = 10\times 10 = 100.

Obviously, this result holds for any balanced group of pupils. This is however much less interesting from a statistical perspective.

Ps- I found recently that both writers of the “Affaire de Logique” page in the weekend Le Monde magazine, Elisabeth Busser and Gilles Cohen, are in fact editors of a math fanzine called Tangente. Gilles Cohen wrote a laudatory review of the book, Le Mythe Climatique, by Benoît Rittaud, next to an explanation by Benoît Rittaud of the findings of Ed Wegman and of his Academy of Sciences committee about the hockey stick temperature curve. While the problem with the hockey stick is clear enough, the data being recentred only against recent observations, the explanations given in Tangente are fairly obscure. As a coincidence, Benoît Rittaud just decided to put his blog on hold and to move to a collective climatoskeptic blog called skyfall

Le Monde rank test (corr’d)

Posted in R, Statistics with tags , , , on April 7, 2010 by xi'an

Since my first representation of the rank statistic as paired was incorrect, here is the histogram produced by the simulation

perm=sample(1:20)
saple[t]=sum(abs(sort(perm[1:10])-sort(perm[11:20])))

when n=20. It is obviously much closer to zero than previously.

An interesting change is that the regression of the log-mean on log(n) produces

> lm(log(memean)~log(enn))
Call:
lm(formula = log(memean) ~ log(enn))
Coefficients:
(Intercept)     log(enn)
 -1.162        1.499

meaning that the mean is in n^{3/2} rather than in n or n^2:

> summary(lm(memean~eth-1))
Coefficients:
      Estimate Std. Error t value Pr(>|t|)
eth 0.3117990  0.0002719    1147   <2e-16 ***

with a very good fit.

Le Monde rank test (cont’d)

Posted in R, Statistics with tags , , , on April 5, 2010 by xi'an

Following a comment from efrique pointing out that this statistic is called Spearman footrule, I want to clarify the notation in

\mathfrak{M}_n = \sum_{i=1}^n |r^x_i-r^y_i|\,,

namely (a) that the ranks of x_i and y_i are considered for the whole sample, i.e.

\{r^x_1,\ldots,r^x_n,r^y_1,\ldots,r^y_n\} = \{1,\ldots,2n\}

instead of being computed separately for the x‘s and the y‘s, and then (b) that the ranks are reordered for each group (meaning that the groups could be of different sizes). This statistics is therefore different from the Spearman footrule studied by Persi Diaconis and R. Graham in a 1977 JRSS paper,

\mathfrak{D}_ n = \sum_{i=1}^n |\pi(i)-\sigma(i)|\,,

where \pi and \sigma are permutations from \mathfrak{S}_n. The mean of \mathfrak{D}_ n is approximately n^{2/3}. I mistakenly referred to Spearman’s ρ rank correlation test in the previous post. It is actually much more related to the Siegel-Tukey test, even though I think there exists a non-parametric test of iid-ness for paired observations… The x‘s and the y‘s are thus not paired, despite what I wrote previously. This distance must be related to some non-parametric test for checking the equality of location parameters.

Le Monde rank test

Posted in R, Statistics with tags , , , , , , , , on April 5, 2010 by xi'an

In the puzzle found in Le Monde of this weekend, the mathematical object behind the silly story is defined as a pseudo-Spearman rank correlation test statistic,

\mathfrak{M}_n = \sum_{i=1}^n |r^x_i-r^y_i|\,,

where the difference between the ranks of the paired random variables x_i and y_i is in absolute value instead of being squared as in the Spearman rank test statistic. I don’t know whether or not this measure of distance has been studied in the statistics literature (although I’d be surprised has it not been studied!). Here is an histogram of the distribution of the new statistics for n=20 under the null hypothesis that both samples are uncorrelated (i.e. that the sequence of ranks is a random permutation). Each point in the sample was obtained by

perm=sample(1:20)
saple[t]=sum(abs(perm[1:10]-perm[11:20]))

When regressing the mean of this statistic \mathfrak{M}_n against the covariates n and n^2, I obtain the uninspiring formula

\mathbb{E} [\mathfrak{M}_n] \approx 0.1681 n^2 - 0.3769 n + 11.1921

which does not translate into a nice polynomial in n!

Another interesting probabilistic/combinatorial problem issued from an earlier Le Monde puzzle: given an urn with n white balls and n black balls that is sampled without replacement, what is the probability that there exists a sequence of length 2k with the same number of white and black balls for k=1,\ldots,n? If k=1,n, the answer is obviously one (1), but for some values of k, it is less than one. When n goes to infinity, this is somehow related to the probability that a Brownian bridge crosses the axis in-between 0 and 1 but I have no clue whether this helps or not! Robin Ryder solved the question for the values n=50 and k=24,25 by establishing that the probability is still one.

Ps- The same math tribune in Le Monde coincidently advertises a book, Le Mythe Climatique, by Benoît Rittaud that adresses … climate change issues and the “statistical mistakes made by climatologists”. The interesting point (if any) is that Benoît Rittaud is a “mathematician not a statistician”, with a few papers in ergodic theory, but this advocated climatoskeptic nonetheless criticises the use of both statistical and simulation tools in climate modeling. (“Simulation has only been around for a few dozen years, a very short span in the history of sciences. The climate debate may be an opportunity to reassess the role of simulation in the scientific process.”)

%d bloggers like this: