**A** paper by Au and Beck (2001) was mentioned during a talk at MCqMC 2018 in Rennes and I checked Probabilistic Engineering Mechanics for details. There is no clear indication that the subset simulation advocated therein is particularly effective. The core idea is to obtain the probability to belong to a small set A by a cascading formula, namely the product of the probability to belong to A¹, then the conditional probability to belong to A² given A¹, &tc. When the subsets A¹, A², …, A constitute a decreasing embedded sequence. The simulation conditional on being in one of the subsets is operated by a random-walk Metropolis-within-Gibbs scheme, with an additional rejection when the value is not in the said subset. (Surprisingly, the authors re-establish the validity of this scheme.) Hence the proposal faces similar issues as nested sampling, except that the nested subsets here are defined quite differently as they are essentially free, provided they can be easily evaluated. Each of the random walks need be scaled, the harder a task because this depends on the corresponding subset volume. The subsets themselves are rarely defined in a natural manner, except when being tail events. And need to be calibrated so that the conditional probability of falling into each remains large enough, the cost of free choice. The Markov chain on the previous subset can prove useful to build the next subset , but there is no general principle behind this remark. (If any, this is connected with X entropy.) But else, the past chains are very much wasted, compared with, say, an SMC treatment of the problem. The paper also notices that starting a Markov chain in the set means there is no burnin time and hence that the probability estimators are thus unbiased. (This creates a correlation between successive Markov chains, but I think it could be ignored if the starting point was chosen at random or after a random number of extra steps.) The authors further point out that the chain may fail to be ergodic, if the proposal distribution lacks energy to link connected regions of the current subset . They suggest using multiple chains with multiple starting points, which alleviates the issue only to some extent, as it ultimately depends on the spread of the starting points. As acknowledged in the paper.

## Archive for random walk

## subset sampling

Posted in Statistics with tags MCMC algorithms, MCqMC 2018, nested sampling, Probabilistic Engineering Mechanics, random walk, Rennes, subset sampling, tail events, tail probabilities on July 13, 2018 by xi'an## cyclic riddle [not in NYC!]

Posted in Kids, R, Running, Travel with tags car accidents, e-bike, electric bike, FiveThirtyEight, game, mathematical puzzle, New York city, R, random walk, reckless driving, The Riddler on December 29, 2017 by xi'an**I**n the riddle of this week on fivethirtyeight, the question is to find the average number of rounds when playing the following game: P=6 players sitting in a circle each have B=3 coins and with probability 3⁻¹ they give one coin to their right or left side neighbour, or dump the coin to the centre. The game ends when all coins are in the centre. Coding this experiment in R is rather easy

situz=rep(B,P) r=1 while (max(situz)>0){ unz=runif(P) newz=situz-(situz>0) for (i in (1:P)[unz<1/3]) newz[i%%P+1]=newz[i%%P+1]+(situz[i]>0) for (i in (1:P)[unz>2/3]) newz[i-1+P*(i==1)]=newz[i-1+P*(i==1)]+(situz[i]>0) situz=newz r=r+1}

resulting in an average of 15.58, but I cannot figure out (quickly enough) an analytic approach to the problem. (The fit above is to a Gamma(â-1,ĝ) distribution.)

In a completely unrelated aparté (aside), I read earlier this week that New York City had prohibited the use of electric bikes. I was unsure of what this meant (prohibited on sidewalks? expressways? metro carriages?) so rechecked and found that electric bikes are simply not allowed for use anywhere in New York City. Because of the potential for “reckless driving”. The target is apparently the high number of delivery e-bikes, but this ban sounds so absurd that I cannot understand it passed. Especially when De Blasio has committed the city to the Paris climate agreement after Trump moronically dumped it… Banning the cars would sound much more in tune with this commitment! (A further aparté is that I strongly dislike e-bikes, running on nuclear power plants, especially when they pass me on sharp hills!, but they are clearly a lesser evil when compared with mopeds and cars.)

## splitting a field by annealing

Posted in Kids, pictures, R, Statistics with tags Autrans, Luxembourg, mathematical puzzle, Palais Ducal, R, random walk, simulated annealing, The Riddler, Vercors on October 18, 2017 by xi'an**A** recent riddle [from The Riddle] that I pondered about during a [long!] drive to Luxembourg last weekend was about splitting a square field into three lots of identical surface for a minimal length of separating wire… While this led me to conclude that the best solution was a T like separation, I ran a simulated annealing R code on my train trip to ~~Autrans~~Valence, seemingly in agreement with this conclusion.I discretised the square into n² units and explored configurations by switching two units with different colours, according to a simulated annealing pattern (although unable to impose connectivity on the three regions!):

partz=matrix(1,n,n) partz[,1:(n/3)]=2;partz[((n/2)+1):n,((n/3)+1):n]=3 #counting adjacent units of same colour nood=hood=matrix(4,n,n) for (v in 1:n2) hood[v]=bourz(v,partz) minz=el=sum(4-hood) for (t in 1:T){ colz=sample(1:3,2) #picks colours a=sample((1:n2)[(partz==colz[1])&(hood<4)],1) b=sample((1:n2)[(partz==colz[2])&(hood<4)],1) partt=partz;partt[b]=colz[1];partt[a]=colz[2] #collection of squares impacted by switch nood=hood voiz=unique(c(a,a-1,a+1,a+n,a-n,b-1,b,b+1,b+n,b-n)) voiz=voiz[(voiz>0)&(voiz<n2)] for (v in voiz) nood[v]=bourz(v,partt) if (nood[a]*nood[b]>0){ difz=sum(nood)-sum(hood) if (log(runif(1))<difz^3/(n^3)*(1+log(10*rep*t)^3)){ el=el-difz;partz=partt;hood=nood if (el<minz){ minz=el;cool=partz} }}}

(where bourz computes the number of neighbours), which produces completely random patterns at high temperatures (low t) and which returns to the T configuration (more or less):if not always, as shown below:Once the (a?) solution was posted on The Riddler, it appeared that one triangular (Y) version proved better than the T one [if not started from corners], with a gain of 3% and that a curved separation was even better with an extra gain less than 1% [solution that I find quite surprising as straight lines should improve upon curved ones…]

## Le Monde puzzle [#1024]

Posted in Books, Kids with tags Bertrand's paradox, competition, Le Monde, mathematical puzzle, Monty Hall problem, R, random walk, Université Paris Dauphine on October 10, 2017 by xi'an**T**he penultimate and appropriately somewhat Monty Hallesque Le Monde mathematical puzzle of the competition!

A dresser with 5×5 drawers contains a single object in one of the 25 drawers. A player opens a drawer at random and, after each choice, the object moves at random to a drawer adjacent to its current location and the drawer chosen by the player remains open. What is the maximum number of drawers one need to open to find the object?

In a dresser with 9 drawers in a line, containing again a single object, the player opens drawers one at a time, after which the open drawer is closed and the object moves to one of the drawers adjacent to its current location. What is the maximum number of drawers one need to open to find the object?

**F**or the first question, setting a pattern of exploration and, given this pattern, simulating a random walk trying to avoid the said pattern as long as possible is feasible, returning a maximum number of steps over many random walks [and hence a lower bound on the true maximum]. As in the following code

sefavyd=function(pater=seq(1,49,2)%%25+1){ fild=matrix(0,5,5) m=pater[1];i=fild[m]=1 t=sample((1:25)[-m],1) nomove=FALSE while (!nomove){ i=i+1 m=pater[i];fild[m]=1 if (t==m){ nomove=TRUE}else{ muv=NULL if ((t-1)%%5>0) muv=c(muv,t-1) if (t%%5>0) muv=c(muv,t+1) if ((t-1)%/%5>0) muv=c(muv,t-5) if (t%/%5<4) muv=c(muv,t+5) muv=muv[fild[muv]==0] nomove=(length(muv)==0) if (!nomove) t=sample(rep(muv,2),1)} } return(i)}

But a direct reasoning starts from the observation that, while two adjacent drawers are not opened, a random walk can, with non-zero probability, switch indefinitely between both drawers. Hence, a sure recovery of the object requires opening one drawer out of two. The minimal number of drawers to open on a 5×5 dresser is 2+3+2+3+2=12. Since in 12 steps, those drawers are all open, spotting the object may require up to 13 steps.

For the second case, unless I [again!] misread the question, whatever pattern one picks for the exploration, there is always a non-zero probability to avoid discovery after an arbitrary number of steps. The [wrong!] answer is thus infinity. To cross-check this reasoning, I wrote the following R code that mimics a random pattern of exploration, associated by an opportunistic random walk that avoids discovery whenever possible (even with very low probability) bu pushing the object towards the centre,

drawl=function(){ i=1;t=5;nomove=FALSE m=sample((1:9)[-t],1) while (!nomove){ nextm=sample((1:9),1) muv=c(t-1,t+1) muv=muv[(muv>0)&(muv<10)&(muv!=nextm)] nomove=(length(muv)==0)||(i>1e6) if (!nomove) t=sample(rep(muv,2),1, prob=1/(5.5-rep(muv,2))^4) i=i+1} return(i)}

which returns unlimited values on repeated runs. However, I was wrong and the R code unable to dismiss my a priori!, as later discussions with Robin and Julien at Paris-Dauphine exhibited ways of terminating the random walk in 18, then 15, then 14 steps! The idea was to push the target to one of the endpoints because it would then have no option but turning back: an opening pattern like 2, 3, 4, 5, 6, 7, 8, 8 would take care of a hidden object starting in an even drawer, while the following 7, 6, 5, 4, 3, 2 openings would terminate any random path starting from an odd drawer. To double check:

grawl=function(){ len=0;muvz=c(3:8,8:1) for (t in 1:9){ i=1;m=muvz[i];nomove=(t==m) while (!nomove){ i=i+1;m=muvz[i];muv=c(t-1,t+1) muv=muv[(muv>0)&(muv<10)&(muv!=m)] nomove=(length(muv)==0) if (!nomove) t=sample(rep(muv,2),1)} len=max(len,i)} return(len)}

produces the value 14.

## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags coupling, cross validated, fleas, Leonhard Euler, Markov chains, Markov random field, Monte Carlo integration, occupancy, Poisson approximation, R, random walk, stationarity on December 8, 2016 by xi'an**A**n old riddle found on X validated asking for Monte Carlo resolution but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){ mean=0 for (t in 1:n){ board=rep(1,900) for (v in 1:T){ beard=rep(0,900) if (board[1]>0){ poz=c(0,1,0,30) ne=rmultinom(1,board[1],prob=(poz!=0)) beard[1+poz]=beard[1+poz]+ne} # for (i in (2:899)[board[-1][-899]>0]){ u=(i-1)%%30+1;v=(i-1)%/%30+1 poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30)) ne=rmultinom(1,board[i],prob=(poz!=0)) beard[i+poz]=beard[i+poz]+ne} # if (board[900]>0){ poz=c(-1,0,-30,0) ne=rmultinom(1,board[900],prob=(poz!=0)) beard[900+poz]=beard[900+poz]+ne} board=beard} mean=mean+sum(board==0)} return(mean/n)}

The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3) [1] 331.082 > 900/exp(1) [1] 331.0915

Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has *no stationary distribution*, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){ return(c(2,rep(3,B-2),2,rep(c(3, rep(4,B-2),3),B-2),2,rep(3,B-2),2))}

namely

> mn=0;n=1e8 #14 clock hours! > proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30) > for (t in 1:n) + mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4] > mn=mn/n > mn[1]=mn[1]-450 > mn 0 1 2 3 166.11 164.92 82.56 27.71 > exprmt(n=1e6) #55 clock hours!! [1] 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)

## random walk on a torus [riddle]

Posted in Books, Kids, pictures with tags An Introduction to Probability Theory and Its Applications, Bayes in Paris, combinatorics, Henry W. Gould, métro, random walk, reflection principle, Stephen Stigler, The Riddler, William Feller on September 16, 2016 by xi'anThe Riddler of this week(-end) has a simple riddle to propose, namely* given a random walk on the {1,2,…,N} torus with a ⅓ probability of death, what is the probability of death occurring at the starting point?*

The question is close to William Feller’s famous Chapter III on random walks. With his equally famous reflection principle. Conditioning on the time n of death, which as we all know is definitely absorbing (!), the event of interest is a passage at zero, or any multiple of N (omitting the torus cancellation), at time n-1 (since death occurs the next time). For a passage in zero, this does not happen if n is even (since n-1 is odd) and else it is a Binomial event with probability

For a passage in kN, with k different from zero, kN+n must be odd and the probability is then

which leads to a global probability of

i.e.

Since this formula is rather unwieldy I looked for another approach in a métro ride [to downtown Paris to enjoy a drink with Stephen Stiegler]. An easier one is to allocate to each point on the torus a probability p[i] to die at position 1 and to solve the system of equations that is associated with it. For instance, when N=3, the system of equations is reduced to

which leads to a probability of ½ to die at position 0 when leaving from 0. When letting N grows to infinity, the torus structure no longer matters and the probability of dying at position 0 implies returning in position 0, which is a special case of the above combinatoric formula, namely

which happens to be equal to

as can be [unnecessarily] checked by a direct R simulation. This √5 is actually the most surprising part of the exercise!