## extinction minus one

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , on March 14, 2022 by xi'an

The riddle from The Riddler of 19 Feb. is about the Bernoulli Galton-Watson process, where each individual in the population has one or zero descendant with equal probabilities: Starting with a large population os size N, what is the probability that the size of the population on the brink of extinction is equal to one? While it is easy to show that the probability the n-th generation is extinct is

$\mathbb{P}(S_n=0) = 1 - \frac{1}{2^{nN}}$

I could not find a way to express the probability to hit one and resorted to brute force simulation, easily coded

for(t in 1:(T<-1e8)){N=Z=1e4
while(Z>1)Z=rbinom(1,Z,.5)
F=F+Z}
F/T


which produces an approximate probability of 0.7213 or 0.714. The impact of N is quickly vanishing, as expected when the probability to reach 1 in one generation is negligible…

However, when returning to Dauphine after a two-week absence, I presented the problem with my probabilist neighbour François Simenhaus, who immediately pointed out that this probability was more simply seen as the probability that the maximum of N independent geometric rv’s was achieved by a single one among the N. Searching later a reference for that probability, I came across the 1990 paper of Bruss and O’Cinneide, which shows that the probability of uniqueness of the maximum does not converge as N goes to infinity, but rather fluctuates around 0.72135 with logarithmic periodicity. It is only when N=2^n that the sequence converges to 0.721521… This probability actually writes down in closed form as

$N\sum_{i=1}^\infty 2^{-i-1}(1-2^{-i})^{N-1}$

(which is obvious in retrospect!, albeit containing a typo in the original paper which is missing a ½ factor in equation (17)) and its asymptotic behaviour is not obvious either, as noted by the authors.

On the historical side, and in accordance with Stiegler’s law, the Galton-Watson process should have been called the Bienaymé process! (Bienaymé was a student of Laplace, who successively lost positions for his political idea, before eventually joining Académie des Sciences, and later founding the Société Mathématique de France.)

## precision in MCMC

Posted in Books, R, Statistics, University life with tags , , , , , , , , , on January 14, 2016 by xi'an

While browsing Images des Mathématiques, I came across this article [in French] that studies the impact of round-off errors on number representations in a dynamical system and checked how much this was the case for MCMC algorithms like the slice sampler (recycling some R code from Monte Carlo Statistical Methods). By simply adding a few signif(…,dig=n) in the original R code. And letting the precision n vary.

“…si on simule des trajectoires pendant des intervalles de temps très longs, trop longs par rapport à la précision numérique choisie, alors bien souvent, les résultats des simulations seront complètement différents de ce qui se passe en réalité…” Pierre-Antoine Guihéneuf

Rather unsurprisingly (!), using a small enough precision (like two digits on the first row) has a visible impact on the simulation of a truncated normal. Moving to three digits seems to be sufficient in this example… One thing this tiny experiment reminds me of is the lumpability property of Kemeny and Snell.  A restriction on Markov chains for aggregated (or discretised) versions to be ergodic or even Markov. Also, in 2000, Laird Breyer, Gareth Roberts and Jeff Rosenthal wrote a Statistics and Probability Letters paper on the impact of round-off errors on geometric ergodicity. However, I presume [maybe foolishly!] that the result stated in the original paper, namely that there exists an infinite number of precision digits for which the dynamical system degenerates into a small region of the space does not hold for MCMC. Maybe foolishly so because the above statement means that running a dynamical system for “too” long given the chosen precision kills the intended stationary properties of the system. Which I interpret as getting non-ergodic behaviour when exceeding the period of the uniform generator. More or less.

## Statistique dans Le Monde

Posted in University life with tags , , , , , , , , , , , on November 5, 2012 by xi'an

Again, some relevant entries in the weekend edition of Le Monde: a paper on Nate Silver and his FivThirtyEight blog, with a short description of his statistical approach, namely to pool all existing polls in a sort of meta-analysis. Not going as far as mentioning LOESS or nearest neighbour regression techniques. [Even less Bayesian!] For this, the FAQ of FivThirtyEight is much more explicit:

Firstly, we assign each poll a weighting based on that pollster’s historical track record, the poll’s sample size, and the recentness of the poll. More reliable polls are weighted more heavily in our averages.

Secondly, we include a regression estimate based on the demographics in each state among our ‘polls’, which helps to account for outlier polls and to keep the polling in its proper context.

Thirdly, we use an inferential process to compute a rolling trendline that allows us to adjust results in states that have not been polled recently and make them ‘current’.

Fourthly, we simulate the election 10,000 times for each site update in order to provide a probabilistic assessment of electoral outcomes based on a historical analysis of polling data since 1952. The simulation further accounts for the fact that similar states are likely to move together, e.g. future polling movement in states like Michigan and Ohio, or North and South Carolina, is likely to be in the same direction

The second paper is a tribune written by Marc Lavielle, senior researcher at INRIA Saclay, on the (French) debate surrounding the recent publication of a study by Séralini et al. on the toxicity of the genetically modified NK603 (Monsanto) corn. Part of the controversy stems form the fact that this paper was distributed to the media prior to its publication with a confidentiality contract that prevented the media to consult other experts (but not from publishing nonsensical definitive headlines). Another part of the controversy comes from the publication by six of the French Académies (namely, Science, Agriculture, Medicine, Pharmacy, Technologies, and Veterinary) of a statement concluding to the lack of reliability of the Food and Chemical Toxicology paper by Séralini et al., followed by another tribune written by Paul Deheuvels, professor of statistics at Université Pierre et Marie Curie and member of the Académie des Sciences, tribune in which he disagrees with the opinion expressed in this statement and legitimately complains not being consulted while being the sole statistician member of the Academy of Sciences. (This debate was also reported in the recent October recap of CNRS Images des Mathématiques.)