## random walk on a torus [riddle]

Posted in Books, Kids, pictures with tags , , , , , , , , , on September 16, 2016 by xi'an

The Riddler of this week(-end) has a simple riddle to propose, namely given a random walk on the {1,2,…,N} torus with a ⅓ probability of death, what is the probability of death occurring at the starting point?

The question is close to William Feller’s famous Chapter III on random walks. With his equally famous reflection principle. Conditioning on the time n of death, which as we all know is definitely absorbing (!), the event of interest is a passage at zero, or any multiple of N (omitting the torus cancellation), at time n-1 (since death occurs the next time). For a passage in zero, this does not happen if n is even (since n-1 is odd) and else it is a Binomial event with probability

${n \choose \frac{n-1}{2}} 2^{-n}$

For a passage in kN, with k different from zero, kN+n must be odd and the probability is then

${n \choose \frac{n-1+kN}{2}} 2^{-n}$

which leads to a global probability of

$\sum_{n=0}^\infty \dfrac{2^n}{3^{n+1}} \sum_{k=-\lfloor (n-1)/N \rfloor}^{\lfloor (n+1)/N \rfloor} {n \choose \frac{n-1+kN}{2}} 2^{-n}$

i.e.

$\sum_{n=0}^\infty \dfrac{1}{3^{n+1}} \sum_{k=-\lfloor (n-1)/N \rfloor}^{\lfloor (n+1)/N \rfloor} {n \choose \frac{n-1+kN}{2}}$

Since this formula is rather unwieldy I looked for another approach in a métro ride [to downtown Paris to enjoy a drink with Stephen Stiegler]. An easier one is to allocate to each point on the torus a probability p[i] to die at position 1 and to solve the system of equations that is associated with it. For instance, when N=3, the system of equations is reduced to

$p_0=1/3+2/3 p_1, \quad p_1=1/3 p_0 + 1/3 p_1$

which leads to a probability of ½ to die at position 0 when leaving from 0. When letting N grows to infinity, the torus structure no longer matters and the probability of dying at position 0 implies returning in position 0, which is a special case of the above combinatoric formula, namely

$\sum_{m=0}^\infty \dfrac{1}{3^{2m+1}} {2m \choose m}$

which happens to be equal to

$\dfrac{1}{3}\,\dfrac{1}{\sqrt{1-4/9}}=\dfrac{1}{\sqrt{5}}\approx 0.4472$

as can be [unnecessarily] checked by a direct R simulation. This √5 is actually the most surprising part of the exercise!

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , on May 27, 2015 by xi'an

Another trip in the métro today (to work with Pierre Jacob and Lawrence Murray in a Paris Anticafé!, as the University was closed) led me to infer—warning!, this is not the exact distribution!—the distribution of x, namely

$f(x|N) = \frac{4^p}{4^{\ell+2p}} {\ell+p \choose p}\,\mathbb{I}_{N=\ell+2p}$

since a path x of length l(x) will corresponds to N draws if N-l(x) is an even integer 2p and p undistinguishable annihilations in 4 possible directions have to be distributed over l(x)+1 possible locations, with Feller’s number of distinguishable distributions as a result. With a prior π(N)=1/N on N, hence on p, the posterior on p is given by

$\pi(p|x) \propto 4^{-p} {\ell+p \choose p} \frac{1}{\ell+2p}$

Now, given N and  x, the probability of no annihilation on the last round is 1 when l(x)=N and in general

$\frac{4^p}{4^{\ell+2p}}{\ell-1+p \choose p}\big/\frac{4^p}{4^{\ell+2p}}{\ell+p \choose p}=\frac{\ell}{\ell+p}=\frac{2\ell}{N+\ell}$

which can be integrated against the posterior. The numerical expectation is represented for a range of values of l(x) in the above graph. Interestingly, the posterior probability is constant for l(x) large  and equal to 0.8125 under a flat prior over N.

Getting back to Pierre Druilhet’s approach, he sets a flat prior on the length of the path θ and from there derives that the probability of annihilation is about 3/4. However, “the uniform prior on the paths of lengths lower or equal to M” used for this derivation which gives a probability of length l proportional to 3l is quite different from the distribution of l(θ) given a number of draws N. Which as shown above looks much more like a Binomial B(N,1/2).

However, being not quite certain about the reasoning involving Fieller’s trick, I ran an ABC experiment under a flat prior restricted to (l(x),4l(x)) and got the above, where the histogram is for a posterior sample associated with l(x)=195 and the gold curve is the potential posterior. Since ABC is exact in this case (i.e., I only picked N’s for which l(x)=195), ABC is not to blame for the discrepancy! I asked about the distribution on Stack Exchange maths forum (and a few colleagues here as well) but got no reply so far… Here is the R code that goes with the ABC implementation:

#observation:
elo=195
#ABC version
T=1e6
el=rep(NA,T)
N=sample(elo:(4*elo),T,rep=TRUE)
for (t in 1:T){
#generate a path
paz=sample(c(-(1:2),1:2),N[t],rep=TRUE)
#eliminate U-turns
uturn=paz[-N[t]]==-paz[-1]
while (sum(uturn>0)){
uturn[-1]=uturn[-1]*(1-
uturn[-(length(paz)-1)])
uturn=c((1:(length(paz)-1))[uturn==1],
(2:length(paz))[uturn==1])
paz=paz[-uturn]
uturn=paz[-length(paz)]==-paz[-1]
}
el[t]=length(paz)}
#subsample to get exact posterior
poster=N[abs(el-elo)==0]


Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on May 13, 2015 by xi'an

Pierre Druilhet arXived a note a few days ago about the Flatland paradox (due to Stone, 1976) and his arguments against the flat prior. The paradox in this highly artificial setting is as follows:  Consider a sequence θ of N independent draws from {a,b,1/a,1/b} such that

1. N and θ are unknown;
2. a draw followed by its inverse and this inverse are removed from θ;
3. the successor x of θ is observed, meaning an extra draw is made and the above rule applied.

Then the frequentist probability that x is longer than θ given θ is at least 3/4—at least because θ could be zero—while the posterior probability that x is longer than θ given x is 1/4 under the flat prior over θ. Paradox that 3/4 and 1/4 clash. Not so much of a paradox because there is no joint probability distribution over (x,θ).

The paradox was actually discussed at length in Larry Wasserman’s now defunct Normal Variate. From which I borrowed Larry’s graphical representation of the four possible values of θ given the (green) endpoint of x. Larry uses the Flatland paradox hammer to fix another nail on the coffin he contemplates for improper priors. And all things Bayes. Pierre (like others before him) argues against the flat prior on θ and shows that a flat prior on the length of θ leads to recover 3/4 as the posterior probability that x is longer than θ.

As I was reading the paper in the métro yesterday morning, I became less and less satisfied with the whole analysis of the problem in that I could not perceive θ as a parameter of the model. While this may sound a pedantic distinction, θ is a latent variable (or a random effect) associated with x in a model where the only unknown parameter is N, the total number of draws used to produce θ and x. The distributions of both θ and x are entirely determined by N. (In that sense, the flatland paradox can be seen as a marginalisation paradox in that an improper prior on N cannot be interpreted as projecting a prior on θ.) Given N, the distribution of x of length l(x) is then 1/4N times the number of ways of picking (N-l(x)) annihilation steps among N. Using a prior on N like 1/N , which is improper, then leads to favour the shortest path as well. (After discussing the issue with Pierre Druilhet, I realised he had a similar perspective on the issue. Except that he puts a flat prior on the length l(x).) Looking a wee bit further for references, I also found that Bruce Hill had adopted the same perspective of a prior on N.

## Rasmus’ socks fit perfectly!

Posted in Books, Kids, R, Statistics, University life with tags , , , , on November 10, 2014 by xi'an

Following the previous post on Rasmus’ socks, I took the opportunity of a survey on ABC I am currently completing to compare the outcome of his R code with my analytical derivation. After one quick correction [by Rasmus] of a wrong representation of the Negative Binomial mean-variance parametrisation [by me], I achieved this nice fit…

## Feller’s shoes and Rasmus’ socks [well, Karl’s actually…]

Posted in Books, Kids, R, Statistics, University life with tags , , , , on October 24, 2014 by xi'an

Yesterday, Rasmus Bååth [of puppies’ fame!] posted a very nice blog using ABC to derive the posterior distribution of the total number of socks in the laundry when only pulling out orphan socks and no pair at all in the first eleven draws. Maybe not the most pressing issue for Bayesian inference in the era of Big data but still a challenge of sorts!

Rasmus set a prior on the total number m of socks, a negative Binomial Neg(15,1/3) distribution, and another prior of the proportion of socks that come by pairs, a Beta B(15,2) distribution, then simulated pseudo-data by picking eleven socks at random, and at last applied ABC (in Rubin’s 1984 sense) by waiting for the observed event, i.e. only orphans and no pair [of socks]. Brilliant!

The overall simplicity of the problem set me wondering about an alternative solution using the likelihood. Cannot be that hard, can it?! After a few computations rejected by opposing them to experimental frequencies, I put the problem on hold until I was back home and with access to my Feller volume 1, one of the few [math] books I keep at home… As I was convinced one of the exercises in Chapter II would cover this case. After checking, I found a partial solution, namely Exercice 26:

A closet contains n pairs of shoes. If 2r shoes are chosen at random (with 2r<n), what is the probability that there will be (a) no complete pair, (b) exactly one complete pair, (c) exactly two complete pairs among them?

This is not exactly a solution, but rather a problem, however it leads to the value

$p_j=\binom{n}{j}2^{2r-2j}\binom{n-j}{2r-2j}\Big/\binom{2n}{2r}$

as the probability of obtaining j pairs among those 2r shoes. Which also works for an odd number t of shoes:

$p_j=2^{t-2j}\binom{n}{j}\binom{n-j}{t-2j}\Big/\binom{2n}{t}$

as I checked against my large simulations. So I solved Exercise 26 in Feller volume 1 (!), but not Rasmus’ problem, since there are those orphan socks on top of the pairs. If one draws 11 socks out of m socks made of f orphans and g pairs, with f+2g=m, the number k of socks from the orphan group is an hypergeometric H(11,m,f) rv and the probability to observe 11 orphan socks total (either from the orphan or from the paired groups) is thus the marginal over all possible values of k:

$\sum_{k=0}^{11} \dfrac{\binom{f}{k}\binom{2g}{11-k}}{\binom{m}{11}}\times\dfrac{2^{11-k}\binom{g}{11-k}}{\binom{2g}{11-k}}$

so it could be argued that we are facing a closed-form likelihood problem. Even though it presumably took me longer to achieve this formula than for Rasmus to run his exact ABC code!

## The Unimaginable Mathematics of Borges’ Library of Babel [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on September 30, 2014 by xi'an

This is a book I carried away from JSM in Boston as the Oxford University Press representative kindly provided my with a copy at the end of the meeting. After I asked for it, as I was quite excited to see a book linking Jorge Luis Borges’ great Library of Babel short story with mathematical concepts. Even though many other short stories by Borges have a mathematical flavour and are bound to fascinate mathematicians, the Library of Babel is particularly prone to mathemati-sation as it deals with the notions of infinite, periodicity, permutation, randomness… As it happens, William Goldbloom Bloch [a patronym that would surely have inspired Borges!], professor of mathematics at Wheaton College, Mass., published the unimaginable mathematics of Borges’ Library of Babel in 2008, so this is not a recent publication. But I had managed to miss through the several conferences where I stopped at OUP exhibit booth. (Interestingly William Bloch has also published a mathematical paper on Neil Stephenson’s Cryptonomicon.)

Now, what is unimaginable in the maths behind Borges’ great Library of Babel??? The obvious line of entry to the mathematical aspects of the book is combinatorics: how many different books are there in total? [Ans. 10¹⁸³⁴⁰⁹⁷…] how many hexagons are needed to shelf that many books? [Ans. 10⁶⁸¹⁵³¹…] how long would it take to visit all those hexagons? how many librarians are needed for a Library containing all volumes once and only once? how many different libraries are there [Ans. 1010⁶…] Then the book embarks upon some cohomology, Cavalieri’s infinitesimals (mentioned by Borges in a footnote), Zeno’s paradox, topology (with Klein’s bottle), graph theory (and the important question as to whether or not each hexagon has one or two stairs), information theory, Turing’s machine. The concluding chapters are comments about other mathematical analysis of Borges’ Grand Œuvre and a discussion on how much maths Borges knew.

So a nice escapade through some mathematical landscapes with more or less connection with the original masterpiece. I am not convinced it brings any further dimension or insight about it, or even that one should try to dissect it that way, because it kills the poetry in the story, especially the play around the notion(s) of infinite. The fact that the short story is incomplete [and short on details] makes its beauty: if one starts wondering at the possibility of the Library or at the daily life of the librarians [like, what do they eat? why are they there? where are the readers? what happens when they die? &tc.] the intrusion of realism closes the enchantment! Nonetheless, the unimaginable mathematics of Borges’ Library of Babel provides a pleasant entry into some mathematical concepts and as such may initiate a layperson not too shy of maths formulas to the beauty of mathematics.

## Le Monde puzzle [#879]

Posted in Books, Kids with tags , , on September 21, 2014 by xi'an

Here is the last week puzzle posted in Le Monde:

Given an alphabet with 26 symbols, is it possible to create 27 different three-symbol words such that

1. all symbols within a word are different
2. all triplets of symbols are different
3. there is no pair of words with a single common symbol

Since there are

28x27x26/3×2=2925

such three-symbol words, it could be feasible to write an R code that builds the 27-uplet, assuming it exists. However, by breaking those words into primary words [that share no common symbols] and secondary words [that share two symbols with one primary word], it seems to me that there can be a maximum of 26 words under those three rules…