## Le Monde puzzle [#990]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , on January 12, 2017 by xi'an

To celebrate the new year (assuming it is worth celebrating!), Le Monde mathematical puzzle came up with the following:

Two sequences (x¹,x²,…) and (y¹,y²,…) are defined as follows: the current value of x is either the previous value or twice the previous value, while the current value of y is the sum of the values of x up to now. What is the minimum number of steps to reach 2016 or 2017?

By considering that all consecutive powers of 2 must appear at least one, the puzzles boils down to finding the minimal number of replications in the remainder of the year minus the sum of all powers of 2. Which itself boils down to deriving the binary decomposition of that remainder. Hence the basic R code (using intToBits):

deco=function(k=2016){
m=trunc(log2(k))
while (sum(2^(0:m))>k) m=m-1
if (sum(2^(0:m))==k){ return(rep(1,m+1))
}else{
res=k-sum(2^(0:m))
return(rep(1,m+1)+as.integer(intToBits(res))[1:(m+1)])


which produces

> sum(deco(2016))
[1] 16
> sum(deco(2017))
[1] 16
> sum(deco(1789))
[1] 18


## a Galton-Watson riddle

Posted in R, Travel with tags , , , on December 30, 2016 by xi'an

The Riddler of this week has an extinction riddle which summarises as follows:

One observes a population of N individuals, each with a probability of 10⁻⁴ to kill the observer each day. From one day to the next, the population decreases by one individual with probability

K√N 10⁻⁴

What is the value of K that leaves the observer alive with probability ½?

Given the sequence of population sizes N,N¹,N²,…, the probability to remain alive is

$(1-10^{-4})^{N+N^1+\ldots}$

where the sum stops with the (sure) extinction of the population. Which is the moment generating function of the sum. At x=1-10⁻⁴. Hence the problem relates to a Galton-Watson extinction problem. However, given the nature of the extinction process I do not see a way to determine the distribution of the sum, except by simulation. Which returns K=27 for the specific value of N=9.

N=9
K=3*N
M=10^4
vals=rep(0,M)
targ=0
ite=1
while (abs(targ-.5)>.01){

for (t in 1:M){
gen=vals[t]=N
while (gen>0){
gen=gen-(runif(1)<K*sqrt(gen)/10^4)
vals[t]=vals[t]+gen}
}
targ=mean(exp(vals*log(.9999)))
print(c(as.integer(ite),K,targ))
if (targ<.5){ K=K*ite/(1+ite)}else{
K=K/(ite/(1+ite))}
ite=ite+1}


The solution proposed on The Riddler is more elegant in that the fixed point equation is

$\prod_{J=1}^9 \frac{K \cdot \sqrt{J}}{K \cdot \sqrt{J} + J}=\frac{1}{2}$

with a solution around K=27.

## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , , , on December 8, 2016 by xi'an

An old riddle found on X validated asking for Monte Carlo resolution  but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){

mean=0
for (t in 1:n){

board=rep(1,900)
for (v in 1:T){

beard=rep(0,900)
if (board[1]>0){
poz=c(0,1,0,30)
ne=rmultinom(1,board[1],prob=(poz!=0))
beard[1+poz]=beard[1+poz]+ne}
#
for (i in (2:899)[board[-1][-899]>0]){
u=(i-1)%%30+1;v=(i-1)%/%30+1
poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30))
ne=rmultinom(1,board[i],prob=(poz!=0))
beard[i+poz]=beard[i+poz]+ne}
#
if (board[900]>0){
poz=c(-1,0,-30,0)
ne=rmultinom(1,board[900],prob=(poz!=0))
beard[900+poz]=beard[900+poz]+ne}
board=beard}
mean=mean+sum(board==0)}
return(mean/n)}


The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3)
[1] 331.082
> 900/exp(1)
[1] 331.0915


Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has no stationary distribution, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){
return(c(2,rep(3,B-2),2,rep(c(3,
rep(4,B-2),3),B-2),2,rep(3,B-2),2))}


namely

> mn=0;n=1e8 #14 clock hours!
> proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30)
> for (t in 1:n)
+ mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4]
> mn=mn/n
> mn[1]=mn[1]-450
> mn
0      1      2     3
166.11 164.92  82.56 27.71
> exprmt(n=1e6) #55 clock hours!!
[1] 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)

## postdoc on missing data at École Polytechnique

Posted in Kids, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , on November 18, 2016 by xi'an

Julie Josse contacted me for advertising a postdoc position at École Polytechnique, in Palaiseau, south of Paris. “The fellowship is focusing on missing data. Interested graduates should apply as early as possible since the position will be filled when a suitable candidate is found. The Centre for Applied Mathematics (CMAP) is  looking for highly motivated individuals able to develop a general multiple imputation method for multivariate continuous and categorical variables and its implementation in the free R software. The successful candidate will be part of research group in the statistical team on missing values. The postdoc will also have excellent opportunities to collaborate with researcher in public health with partners on the analysis of a large register from the Paris Hospital (APHP) to model the decisions and events when severe trauma patients are handled by emergency doctors. Candidates should contact Julie Josse at polytechnique.edu.”

## copy code at your own peril

Posted in Books, Kids, R, Statistics, University life with tags , , , , , on November 14, 2016 by xi'an

I have come several times upon cases of scientists [I mean, real, recognised, publishing, senior scientists!] from other fields blindly copying MCMC code from a paper or website, and expecting the program to operate on their own problem… One illustration is from last week, when I read a X Validated question [from 2013] about an attempt of that kind, on a rather standard Normal posterior, but using an R code where the posterior function was not even defined. (I foolishly replied, despite having no expectation of a reply from the person asking the question.)

## Le Monde puzzle [#940]

Posted in Kids, Statistics, Travel, University life with tags , , , on November 11, 2016 by xi'an

A sudoku-like Le Monde mathematical puzzle:

On a 3×3 grid, all integers from 1 to 9 are present. Considering all differences between adjacent entries, the value of the grid is the minimum difference. What is the maximum possible value?

In a completely uninspired approach considering random permutations on {1,..,9}, the grid value can be computed as

neigh=c(1,2,4,5,7,8,1,4,2,5,3,6)
nigh=c(2,3,5,6,8,9,4,7,5,8,6,9)
perm=sample(9)
val<-function(perm){
min(abs(perm[neigh]-perm[nigh]))}

which produces a value of 3 for the maximal value. For a 4×4 grid

neigh=c(1:3,5:7,9:11,13:15,1+4*(0:2),2+4*(0:2),3+4*(0:2),4*(1:3))
nigh=c(2:4,6:8,10:12,14:16,1+4*(1:3),2+4*(1:3),3+4*(1:3),4*(2:4))
perm=sample(16)
val<-function(perm){
min(abs(perm[neigh]-perm[nigh]))}

the code returns 5. For the representation

[,1] [,2] [,3] [,4]
[1,] 8 13 3 11
[2,] 15 4 12 5
[3,] 9 14 6 16
[4,] 2 7 1 10

## ratio-of-uniforms [#2]

Posted in Books, R, Statistics with tags , , , , , , on October 31, 2016 by xi'an

Following my earlier post on Kinderman’s and Monahan’s (1977) ratio-of-uniform method, I must confess I remain quite puzzled by the approach. Or rather by its consequences. When looking at the set A of (u,v)’s in R⁺×X such that 0≤u²≤ƒ(v/u), as discussed in the previous post, it can be represented by its parameterised boundary

u(x)=√ƒ(x),v(x)=x√ƒ(x)    x in X

Similarly, since the simulation from ƒ(v/u) can also be derived [check Luc Devroye’s Non-uniform random variate generation in the exercise section 7.3] from a uniform on the set B of (u,v)’s in R⁺×X such that 0≤u≤ƒ(v+u), on the set C of (u,v)’s in R⁺×X such that 0≤u³≤ƒ(v/√u)², or on the set D of (u,v)’s in R⁺×X such that 0≤u²≤ƒ(v/u), which is actually exactly the same as A [and presumably many other versions!, for which I would like to guess the generic rule of construction], there are many sets on which one can consider running simulations. And one to pick for optimality?! Here are the three sets for a mixture of two normal densities:

For instance, assuming slice sampling is feasible on every one of those three sets, which one is the most efficient? While I have no clear answer to this question, I found on Sunday night that a generic family of transforms is indexed by a differentiable  monotone function h over the positive half-line, with the uniform distribution being taken over the set

H={(u,v);0≤u≤h(f(v/g(u))}

when the primitive G of g is the inverse of h, i.e., G(h(x))=x. [Here are the slides I gave at the Warwick reading group on Devroye’s book last week:]