## Le Monde on E. Wegman

Posted in Statistics with tags , , , , , , on December 31, 2011 by xi'an

In addition to the solution to the wrong problem, Le Monde of last weekend also dedicated a full page of its Science leaflet to the coverage of Michael Mann’s hockey curve of temperature increase and the hard time he has been given by climato-skeptics since its publication in 1998… The page includes an insert on Ed Wegman’s 2006 [infamous] report for the U.S. Congress, amply documented on Andrew’s blog. And mentions the May 2011 editorial of Nature on the plagiarism investigation. (I reproduce it above as it is not available on the Le Monde website.)

## València 9 snapshot [1]

Posted in Mountains, Running, Statistics, Travel, University life with tags , , , , , on June 5, 2010 by xi'an

Last morning, I attended the talks of Michael Goldstein and Herbie Lee, which were very interesting from very different perspectives. Michael talked about computer models, like the climate models that have been so much attacked recently for being “unrealistic”. The difficulty is obviously in dealing with the fact that the model is incorrect, what Michael calls external uncertainty. As statisticians, we are trained to deal with internal uncertainties, i.e. those conditional on the model. Michael did not propose a generic solution to this difficult problem, but he presented a series of principles towards this goal and his paper in the proceeedings (I have not [yet] read) contains examples of conducting this assessment. (I am not sure building a [statistical] model on top of the current [physical] models stands a chance to convince climato-skeptics, but this is interesting nonetheless.) Herbie addressed a completely different problem, namely the maximisation of a function under constraints when the constraints are partly unknown. (Think of a set whose boundaries are not precisely known.) This was a problem new to me and I plan to read the paper asap, as the design perspective added to the maximisation per se is made in order to decide about the worth of making new [costly] evaluations of the function to maximise.

Otherwise, the morning was spent in a fruitless pursuit of a wireless connection in the hotel where the conference takes place, as so many people were trying to connect at the same time! I eventually resolved the issue by crossing the road to an internet café and renting an ethernet cable for one hour. The hotel is unsurprisingly the soulless and unhelpful place I expected and I do not find any appeal in the high rise landscape constituting the neighbourhood. There is however a small track in the bush nearby that makes for a good running place in the early morning. (Finding a cliff that is both bolted and in the shade is going to prove a challenge!)

## Le Monde rank test

Posted in R, Statistics with tags , , , , , , , , on April 5, 2010 by xi'an

In the puzzle found in Le Monde of this weekend, the mathematical object behind the silly story is defined as a pseudo-Spearman rank correlation test statistic,

$\mathfrak{M}_n = \sum_{i=1}^n |r^x_i-r^y_i|\,,$

where the difference between the ranks of the paired random variables $x_i$ and $y_i$ is in absolute value instead of being squared as in the Spearman rank test statistic. I don’t know whether or not this measure of distance has been studied in the statistics literature (although I’d be surprised has it not been studied!). Here is an histogram of the distribution of the new statistics for $n=20$ under the null hypothesis that both samples are uncorrelated (i.e. that the sequence of ranks is a random permutation). Each point in the sample was obtained by

perm=sample(1:20)
saple[t]=sum(abs(perm[1:10]-perm[11:20]))

When regressing the mean of this statistic $\mathfrak{M}_n$ against the covariates $n$ and $n^2$, I obtain the uninspiring formula

$\mathbb{E} [\mathfrak{M}_n] \approx 0.1681 n^2 - 0.3769 n + 11.1921$

which does not translate into a nice polynomial in $n$!

Another interesting probabilistic/combinatorial problem issued from an earlier Le Monde puzzle: given an urn with $n$ white balls and $n$ black balls that is sampled without replacement, what is the probability that there exists a sequence of length $2k$ with the same number of white and black balls for $k=1,\ldots,n$? If $k=1,n$, the answer is obviously one (1), but for some values of $k$, it is less than one. When $n$ goes to infinity, this is somehow related to the probability that a Brownian bridge crosses the axis in-between $0$ and $1$ but I have no clue whether this helps or not! Robin Ryder solved the question for the values $n=50$ and $k=24,25$ by establishing that the probability is still one.

Ps- The same math tribune in Le Monde coincidently advertises a book, Le Mythe Climatique, by Benoît Rittaud that adresses … climate change issues and the “statistical mistakes made by climatologists”. The interesting point (if any) is that Benoît Rittaud is a “mathematician not a statistician”, with a few papers in ergodic theory, but this advocated climatoskeptic nonetheless criticises the use of both statistical and simulation tools in climate modeling. (“Simulation has only been around for a few dozen years, a very short span in the history of sciences. The climate debate may be an opportunity to reassess the role of simulation in the scientific process.”)