## one or two?

Posted in Books, Kids, R with tags , , , , , , on March 12, 2020 by xi'an

A superposition of two random walks from The Riddler:

Starting from zero, a random walk is produced by choosing moves between ±1 and ±2 at each step. If the choice between both is made towards maximising the probability of ending up positive after 100 steps, what is this probability?

Although the optimal path is not necessarily made of moves that optimise the probability of ending up positive after the remaining steps, I chose to follow a dynamic programming approach by picking between ±1 and ±2 at each step based on that probability:

```bs=matrix(0,405,101) #best stategy with value i-203 at time j-1
bs[204:405,101]=1
for (t in 100:1){
tt=2*t
bs[203+(-tt:tt),t]=.5*apply(cbind(
bs[204+(-tt:tt),t+1]+bs[202+(-tt:tt),t+1],
bs[201+(-tt:tt),t+1]+bs[205+(-tt:tt),t+1]),1,max)}
```

resulting in the probability

```> bs[203,1]
[1] 0.6403174
```

Just checking that a simple strategy of picking ±1 above zero and ±2 below leads to the same value

```ga=rep(0,T)
for(v in 1:100) ga=ga+(1+(ga<1))*sample(c(-1,1),T,rep=TRUE)
```

or sort of

```> mean(ga>0)
[1] 0.6403494
```

With highly similar probabilities when switching at ga<2

```> mean(ga>0)
[1] 0.6403183
```

or ga<0

```> mean(ga>0)
[1] 0.6403008
```

and too little difference to spot a significant improvement between the three boundaries.

## an elegant sampler

Posted in Books, Kids, R, University life with tags , , , , , , , on January 15, 2020 by xi'an

Following an X validated question on how to simulate a multinomial with fixed average, W. Huber produced a highly elegant and efficient resolution with the compact R code

```tabulate(sample.int((k-1)*n, s-n) %% n + 1, n) + 1
```

where k is the number of classes, n the number of draws, and s equal to n times the fixed average. The R function sample.int is an alternative to sample that seems faster. Breaking the outcome of

```sample.int((k-1)*n, s-n)
```

as nonzero positions in an n x (k-1) matrix and adding a adding a row of n 1’s leads to a simulation of integers between 1 and k by counting the 1’s in each of the n columns, which is the meaning of the above picture. Where the colour code is added after counting the number of 1’s. Since there are s 1’s in this matrix, the sum is automatically equal to s. Since the s-n positions are chosen uniformly over the n x (k-1) locations, the outcome is uniform. The rest of the R code is a brutally efficient way to translate the idea into a function. (By comparison, I brute-forced the question by suggesting a basic Metropolis algorithm.)

## riddle on a circle

Posted in Books, Kids, R, Travel with tags , , , , , , , on December 22, 2019 by xi'an

The Riddler’s riddle this week provides another opportunity to resort to brute-force simulated annealing!

Given a Markov chain defined on the torus {1,2,…,100} with only moves a drift to the right (modulo 100) and a uniformely random jump, find the optimal transition matrix to reach 42 in a minimum (average) number of moves.

Which I coded in my plane to Seattle, under the assumption that there is nothing to do when the chain is already in 42. And the reasoning that there is not gain (on average) in keeping the choice between right shift and random jump random.

```dure=min(c(41:0,99:42),50)
temp=.01
for (t in 1:1e6){
i=sample((1:100)[-42],1)
dura=1+mean(dure)
if (temp*log(runif(1))<dure[i]-dura) dure[i]=dura
if(temp*log(runif(1))<dure[i]-(dura<-1+dure[i*(i<100)+1]))
dure[i]=dura
temp=temp/(1+.1e-4*(runif(1)>.99))}
```

In all instances, the solution is to move at random for any position but those between 29 and 41, for an average 13.64286 number of steps to reach 42. (For values outside the range 29-42.)

## subset sampling

Posted in Statistics with tags , , , , , , , , on July 13, 2018 by xi'an

A paper by Au and Beck (2001) was mentioned during a talk at MCqMC 2018 in Rennes and I checked Probabilistic Engineering Mechanics for details. There is no clear indication that the subset simulation advocated therein is particularly effective. The core idea is to obtain the probability to belong to a small set A by a cascading formula, namely the product of the probability to belong to A¹, then the conditional probability to belong to A² given A¹, &tc. When the subsets A¹, A², …, A constitute a decreasing embedded sequence. The simulation conditional on being in one of the subsets $A^i$ is operated by a random-walk Metropolis-within-Gibbs scheme, with an additional rejection when the value is not in the said subset. (Surprisingly, the authors re-establish the validity of this scheme.) Hence the proposal faces similar issues as nested sampling, except that the nested subsets here are defined quite differently as they are essentially free, provided they can be easily evaluated. Each of the random walks need be scaled, the harder a task because this depends on the corresponding subset volume. The subsets $A^i$ themselves are rarely defined in a natural manner, except when being tail events. And need to be calibrated so that the conditional probability of falling into each remains large enough, the cost of free choice. The Markov chain on the previous subset $A^i$ can prove useful to build the next subset $A^{i+1}$, but there is no general principle behind this remark. (If any, this is connected with X entropy.) But else, the past chains are very much wasted, compared with, say, an SMC treatment of the problem. The paper also notices that starting a Markov chain in the set $A^{i+1}$ means there is no burnin time and hence that the probability estimators are thus unbiased. (This creates a correlation between successive Markov chains, but I think it could be ignored if the starting point was chosen at random or after a random number of extra steps.) The authors further point out that the chain may fail to be ergodic, if the proposal distribution lacks energy to link connected regions of the current subset $A^i$. They suggest using multiple chains with multiple starting points, which alleviates the issue only to some extent, as it ultimately depends on the spread of the starting points. As acknowledged in the paper.

## cyclic riddle [not in NYC!]

Posted in Kids, R, Running, Travel with tags , , , , , , , , , , on December 29, 2017 by xi'an

In the riddle of this week on fivethirtyeight, the question is to find the average number of rounds when playing the following game: P=6 players sitting in a circle each have B=3 coins and with probability 3⁻¹ they give one coin to their right or left side neighbour, or dump the coin to the centre. The game ends when all coins are in the centre. Coding this experiment in R is rather easy

```situz=rep(B,P)
r=1
while (max(situz)>0){
unz=runif(P)
newz=situz-(situz>0)
for (i in (1:P)[unz<1/3])
newz[i%%P+1]=newz[i%%P+1]+(situz[i]>0)
for (i in (1:P)[unz>2/3])
newz[i-1+P*(i==1)]=newz[i-1+P*(i==1)]+(situz[i]>0)
situz=newz
r=r+1}
```

resulting in an average of 15.58, but I cannot figure out (quickly enough) an analytic approach to the problem. (The fit above is to a Gamma(â-1,ĝ) distribution.)

In a completely unrelated aparté (aside), I read earlier this week that New York City had prohibited the use of electric bikes. I was unsure of what this meant (prohibited on sidewalks? expressways? metro carriages?) so rechecked and found that electric bikes are simply not allowed for use anywhere in New York City. Because of the potential for “reckless driving”. The target is apparently the high number of delivery e-bikes, but this ban sounds so absurd that I cannot understand it passed. Especially when De Blasio has committed the city to the Paris climate agreement after Trump moronically dumped it… Banning the cars would sound much more in tune with this commitment! (A further aparté is that I strongly dislike e-bikes, running on nuclear power plants,  especially when they pass me on sharp hills!, but they are clearly a lesser evil when compared with mopeds and cars.)

## splitting a field by annealing

Posted in Kids, pictures, R, Statistics with tags , , , , , , , , on October 18, 2017 by xi'an

A recent riddle [from The Riddle] that I pondered about during a [long!] drive to Luxembourg last weekend was about splitting a square field into three lots of identical surface for a minimal length of separating wire… While this led me to conclude that the best solution was a T like separation, I ran a simulated annealing R code on my train trip to AutransValence, seemingly in agreement with this conclusion.I discretised the square into n² units and explored configurations by switching two units with different colours, according to a simulated annealing pattern (although unable to impose connectivity on the three regions!):

```partz=matrix(1,n,n)
partz[,1:(n/3)]=2;partz[((n/2)+1):n,((n/3)+1):n]=3
#counting adjacent units of same colour
nood=hood=matrix(4,n,n)
for (v in 1:n2) hood[v]=bourz(v,partz)
minz=el=sum(4-hood)
for (t in 1:T){
colz=sample(1:3,2) #picks colours
a=sample((1:n2)[(partz==colz[1])&(hood<4)],1)
b=sample((1:n2)[(partz==colz[2])&(hood<4)],1)
partt=partz;partt[b]=colz[1];partt[a]=colz[2]
#collection of squares impacted by switch
nood=hood
voiz=unique(c(a,a-1,a+1,a+n,a-n,b-1,b,b+1,b+n,b-n))
voiz=voiz[(voiz>0)&(voiz<n2)]
for (v in voiz) nood[v]=bourz(v,partt)
if (nood[a]*nood[b]>0){
difz=sum(nood)-sum(hood)
if (log(runif(1))<difz^3/(n^3)*(1+log(10*rep*t)^3)){
el=el-difz;partz=partt;hood=nood
if (el<minz){ minz=el;cool=partz}
}}}
```

(where bourz computes the number of neighbours), which produces completely random patterns at high temperatures (low t) and which returns to the T configuration (more or less):if not always, as shown below:Once the (a?) solution was posted on The Riddler, it appeared that one triangular (Y) version proved better than the T one [if not started from corners], with a gain of 3% and that a curved separation was even better with an extra gain less than 1% [solution that I find quite surprising as straight lines should improve upon curved ones…]

## Monte Carlo calculations of the radial distribution functions for a proton-electron plasma

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on October 11, 2017 by xi'an

“In conclusion, the Monte Carlo method of calculating radial distribution functions in a plasma is a feasible approach if significant computing time is available (…) The results indicate that at least 10000 iterations must be completed before the system can be considered near to its equilibrium state, and for a badly chosen starting configuration, the run would need to be considerably longer (…) for more conclusive results a longer run is needed so that the energy of the system can settle into an equilibrium pattern and steady-state radial distribution functions can be obtained.” A.A. Barker

Looking for the history behind Barker’s formula the other day made me look for the original 1965 paper. Which got published in the Australian Journal of Physics at the beginning of Barker’s PhD at the University of Adelaide.

As shown in the above screenshot, the basis  of Barker’s algorithm is indeed Barker’s acceptance probability, albeit written in a somewhat confusing way since the current value of the chain is kept if a Uniform variate is smaller than what is actually the rejection probability. No mistake there! And more interestingly, Barker refers to Wood and Parker (1957) for the “complete and rigorous theory” behind the method. (Both Wood and Parker being affiliated with Los Alamos Scientific Laboratory, while Barker acknowledges support from both the Australian Institute of Nuclear Science and Engineering and the Weapons Research Establishment, Salisbury… This were times when nuclear weapon research was driving MCMC. Hopefully we will not come back to such times. Or, on the pessimistic side, we will not have time to come back to such times!)

As in Metropolis et al. (1953), the analysis is made on a discretised (finite) space, building the Markov transition matrix, stating the detailed balance equation (called microscopic reversibility). Interestingly, while Barker acknowledges that there are other ways of assigning the transition probability, his is the “most rapid” in terms of mixing. And equally interestingly, he discusses the scale of the random walk in the [not-yet-called] Metropolis-within-Gibbs move as major, targetting 0.5 as the right acceptance rate, and suggesting to adapt this scale on the go. There is also a side issue that is apparently not processed with all due rigour, namely the fact that the particles in the system cannot get arbitrarily close from one another. It is unclear how a proposal falling below this distance is processed by Barker’s algorithm. When implemented on 32 particles, this algorithm took five hours to execute 6100 iterations. With a plot of the target energy function that does not shout convergence, far from it! As acknowledged by Barker himself (p.131).

The above quote is from the conclusion and its acceptance of the need for increased computing times comes as a sharp contrast with this week when one of our papers was rejected based on this very feature..!