## around the table

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on December 2, 2020 by xi'an The Riddler has a variant on the classical (discrete) random walk around a circle where every state (but the starting point) has the same probability 1/(n-1) to be visited last. Surprising result that stems almost immediately from the property that, leaving from 0, state a is visited couterclockwise before state b>a is visited clockwise is b/a+b. The variant includes (or seems to include) the starting state 0 as counting for the last visit (as a return to the origin). In that case, all n states, including the origin, but the two neighbours of 0, 1, and n-1, have the same probability to be last. This can also be seen on an R code that approximates (inner loop) the probability that a given state is last visited and record how often this probability is largest (outer loop):

```w=0*(1:N)#frequency of most likely last
for(t in 1:1e6){
o=0*w#probabilities of being last
for(v in 1:1e6)#sample order of visits
o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),300,rep=T))%%N)[N]]+1
w[j]=w[j<-order(o)[N]]+1}
```

However, upon (jogging) reflection, the double loop is a waste of energy and

```o=0*(1:N)
for(v in 1:1e8)
o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),500,rep=T))%%N)[N]]+1
```

should be enough to check that all n positions but both neighbours have the same probability of being last visited. Removing the remaining loop should be feasible by considering all subchains starting at one of the 0’s, since this is a renewal state, but I cannot fathom how to code it succinctly. A more detailed coverage of the original problem (that is, omitting the starting point) was published the Monday after publication of the riddle on R bloggers, following a blog post by David Robinson on Variance Explained.

R codegolf challenge: is there a way to shorten the above R for loop in a single line command?!

## the riddle(r) of the certain winner losing in the end

Posted in Books, Kids, R, Statistics with tags , , , , , on November 25, 2020 by xi'an Considering a binary random walk, starting at zero, what is the probability of being almost sure of winning at some point only to lose at the end? This is the question set by the post-election Riddler, with almost sure meaning above 99% and the time horizon set to n=101 steps (it could have been 50 or 538!). As I could not see a simple way to compute the collection of states with a probability of being positive at the end of at least 0.99, even after checking William Feller’s Random Walks fabulous chapter, I wrote an R code to find them, and then ran a Monte Carlo evaluation of the probability to reach this collection and still end up with a negative value. Which came as 0.00212 over repeated simulations. Obviously smaller than 0.01, but no considerably so. As seen on the above picture, the set to be visited is actually not inconsiderable. The bounding curves are the diagonal and the 2.33 √(n-t) bound derived from the limiting Brownian approximation to the random walk, which fits rather well. (I wonder if there is a closed form expression for the probability of the Brownian hitting the boundary 2.33 √(n-t). Simulations with 1001 steps give an estimated probability of 0.505, leading to a final probability of 0.00505 of getting over the boundary and loosing in the end, close to the 1/198 produced by The Riddler.)

## one or two?

Posted in Books, Kids, R with tags , , , , , , on March 12, 2020 by xi'an

A superposition of two random walks from The Riddler:

Starting from zero, a random walk is produced by choosing moves between ±1 and ±2 at each step. If the choice between both is made towards maximising the probability of ending up positive after 100 steps, what is this probability?

Although the optimal path is not necessarily made of moves that optimise the probability of ending up positive after the remaining steps, I chose to follow a dynamic programming approach by picking between ±1 and ±2 at each step based on that probability:

```bs=matrix(0,405,101) #best stategy with value i-203 at time j-1
bs[204:405,101]=1
for (t in 100:1){
tt=2*t
bs[203+(-tt:tt),t]=.5*apply(cbind(
bs[204+(-tt:tt),t+1]+bs[202+(-tt:tt),t+1],
bs[201+(-tt:tt),t+1]+bs[205+(-tt:tt),t+1]),1,max)}
```

resulting in the probability

```> bs[203,1]
 0.6403174
```

Just checking that a simple strategy of picking ±1 above zero and ±2 below leads to the same value

```ga=rep(0,T)
for(v in 1:100) ga=ga+(1+(ga<1))*sample(c(-1,1),T,rep=TRUE)
```

or sort of

```> mean(ga>0)
 0.6403494
```

With highly similar probabilities when switching at ga<2

```> mean(ga>0)
 0.6403183
```

or ga<0

```> mean(ga>0)
 0.6403008
```

and too little difference to spot a significant improvement between the three boundaries.

## an elegant sampler

Posted in Books, Kids, R, University life with tags , , , , , , , on January 15, 2020 by xi'an Following an X validated question on how to simulate a multinomial with fixed average, W. Huber produced a highly elegant and efficient resolution with the compact R code

```tabulate(sample.int((k-1)*n, s-n) %% n + 1, n) + 1
```

where k is the number of classes, n the number of draws, and s equal to n times the fixed average. The R function sample.int is an alternative to sample that seems faster. Breaking the outcome of

```sample.int((k-1)*n, s-n)
```

as nonzero positions in an n x (k-1) matrix and adding a adding a row of n 1’s leads to a simulation of integers between 1 and k by counting the 1’s in each of the n columns, which is the meaning of the above picture. Where the colour code is added after counting the number of 1’s. Since there are s 1’s in this matrix, the sum is automatically equal to s. Since the s-n positions are chosen uniformly over the n x (k-1) locations, the outcome is uniform. The rest of the R code is a brutally efficient way to translate the idea into a function. (By comparison, I brute-forced the question by suggesting a basic Metropolis algorithm.)