## the random variable that was always less than its mean…

Posted in Books, Kids, R, Statistics with tags , , , , , on May 30, 2016 by xi'an

Although this is far from a paradox when realising why the phenomenon occurs, it took me a few lines to understand why the empirical average of a log-normal sample is apparently a biased estimator of its mean. And why conversely the biased plug-in estimator does not appear to present a bias. To illustrate this “paradox” consider the picture below which compares both estimators of the mean of a log-normal LN(0,σ²) distribution as σ² increases: blue stands for the empirical mean, while gold corresponds to the plug-in estimator exp(σ²/2) when σ² is estimated from the log-sample, as in a normal sample. (The sample is of size 10⁶.) The gold sequence remains around one, while the blue one drifts away towards zero…

The question came on X validated and my first reaction was to doubt an implementation which outcome was so counter-intuitive. But then I thought further about the representation of a log-normal variate as exp(σξ) when ξ is a standard Normal variate. When σ grows large enough, it is near impossible for σξ to be larger than σ². More precisely,

P(X>E[X])=P(σξ>σ²/2)=1-Φ(σ/2)

which can be arbitrarily small.

## Verdun 1916, a hundred years ago

Posted in Kids, pictures with tags , , , on May 29, 2016 by xi'an

## a bone of contention

Posted in pictures with tags , , , , , , on May 28, 2016 by xi'an

“In an age in which ancient genomes can reveal startling links between historical populations, we should ask not just whether remains should be reburied, but who decides and on what grounds.”

An article in Nature described the story of fairly old remains (of the Kennewick Man) in North America that were claimed for reburial by several Native American groups and that were found to be closer [in a genetic sense] to groups that were geographically farther (from South America and even Australian aboriginal Australians). What I find difficult to understand (while it stands at the centre of the legal dispute) is how any group of individuals can advance a claim on bones that are 8,000 year old. With such a time gap (and assuming the DNA analysis is trustworthy) the number of individuals who share the owner of the bones as one ancestor is presumably very large and it is hard to imagine all those descendants coming to an agreement about the management of the said bones. Or even that any descendant has any right on the said bones after so many generations which may have seen major changes in the way deceased members of the community are treated. I am thus surprised that a judiciary court or the US government could even consider such requests.

## another riddle with a stopping rule

Posted in Books, Kids, R with tags , , , on May 27, 2016 by xi'an

A puzzle on The Riddler last week that is rather similar to an earlier one. Given the probability (1/2,1/3,1/6) on {1,2,3}, what is the mean of the number N of draws to see all possible outcomes and what is the average number of 1’s in those draws? The second question is straightforward, as the proportions of 1’s, 2’s and 3’s in the sequence till all values are observed remain 3/6, 2/6 and 1/6. The first question follows from the representation of the average

$\mathbb{E}[N]=\sum_{n=3}^\infty \mathbb{P}(N>n) + 3$

as the probability to exceed n is the probability that at least one value is not observed by the n-th draw, namely

3+(1/2)n+(2/3)n+(5/6)n-(1/6)n-(1/3)n-(1/2)n

which leads to an easy summation for the expectation, namely

3+(2/3)³/(1/3)+(5/6)³/(1/6)-(1/3)³/(2/3)-(1/6)³/(5/6)=73/10

## sanpshot from München [#2]

Posted in pictures, Travel, University life with tags , , , , on May 26, 2016 by xi'an

## Computing the variance of a conditional expectation via non-nested Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , , , on May 26, 2016 by xi'an

The recent arXival by Takashi Goda of Computing the variance of a conditional expectation via non-nested Monte Carlo led me to read it as I could not be certain of the contents from only reading the title! The short paper considers the issue of estimating the variance of a conditional expectation when able to simulate the joint distribution behind the quantity of interest. The second moment E(E[f(X)|Y]²) can be written as a triple integral with two versions of x given y and one marginal y, which means that it can approximated in an unbiased manner by simulating a realisation of y then conditionally two realisations of x. The variance requires a third simulation of x, which the author seems to deem too costly and that he hence replaces with another unbiased version based on two conditional generations only. (He notes that a faster biased version is available with bias going down faster than the Monte Carlo error, which makes the alternative somewhat irrelevant, as it is also costly to derive.) An open question after reading the paper stands with the optimal version of the generic estimator (5), although finding the optimum may require more computing time than it is worth spending. Another one is whether or not this version of the expected conditional variance is more interesting (computation-wise) that the difference between the variance and the expected conditional variance as reproduced in (3) given that both quantities can equally be approximated by unbiased Monte Carlo…

## the end of Series B!

Posted in Books, pictures, Statistics, University life with tags , , , , on May 25, 2016 by xi'an

I received this news from the RSS today that all the RSS journals are turning 100% electronic. No paper version any longer! I deeply regret this move on which, as an RSS member, I would have appreciated to be consulted as I find much easier to browse through the current issue when it arrives in my mailbox, rather than being t best reminded by an email that I will most likely ignore and erase. And as I consider the production of the journals the prime goal of the Royal Statistical Society. And as I read that only 25% of the members had opted so far for the electronic format, which does not sound to me like a majority. In addition, moving to electronic-only journals does not bring the perks one would expect from electronic journals:

• no bonuses like supplementary material, code, open or edited comments
• no reduction in the subscription rate of the journals and penalty fees if one still wants a paper version, which amounts to a massive increase in the subscription price
• no disengagement from the commercial publisher, whose role become even less relevant
• no access to the issues of the years one has paid for, once one stops subscribing.

“The benefits of electronic publishing include: faster publishing speeds; increased content; instant access from a range of electronic devices; additional functionality; and of course, environmental sustainability.”

The move is sold with typical marketing noise. But I do not buy it: publishing speeds will remain the same as driven by the reviewing part, I do not see where the contents are increased, and I cannot seriously read a journal article from my phone, so this range of electronic devices remains a gadget. Not happy!