## new order

Posted in Books, Kids, R, Statistics with tags , , , , on February 5, 2021 by xi'an

The latest riddle from The Riddler was both straightforward: given four iid Normal variates, X¹,X²,X³,X⁴, what is the probability that X¹+X²<X³+X⁴ given that X¹<X³ ? The answer is ¾ and it actually does not depend on the distribution of the variates. The bonus question is however much harder: what is this probability when there are 2N iid Normal variates?

I posted the question on math.stackexchange, then on X validated, but received no hint at a possible simplification of the probability. And then erased the questions. Given the shape of the domain where the bivariate Normal density is integrated, it sounds most likely there is no closed-form expression. (None was proposed by the Riddler.) The probability decreases roughly in N³ when computing this probability by simulation and fitting a regression.

```> summary(lm(log(p)~log(r)))

Residuals:
Min        1Q    Median        3Q       Max
-0.013283 -0.010362 -0.000606  0.007835  0.039915

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.111235   0.008577  -12.97 4.11e-13 ***
log(r)      -0.311361   0.003212  -96.94  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01226 on 27 degrees of freedom
Multiple R-squared:  0.9971,	Adjusted R-squared:  0.997
F-statistic:  9397 on 1 and 27 DF,  p-value: < 2.2e-16
```

## easy and uneasy riddles

Posted in Books, Kids, R with tags , , , , , on February 2, 2021 by xi'an

On 15 January, The Riddler had both a straightforward and a challenging riddles. The first one was to optimise the choice of a real number d with the utility function U(d,θ)=d ℑ(θ>d), when θ is Uniform(0,100). Leading unsurprisingly to d=50…

The tough(er) one was to solve a form of sudoku where the 24 entries of a 8×3 table are integers in {1,…,9} and the information is provided by the row-wise and column-wise products of these integers. The vertical margins are

294, 216, 135, 98, 112, 84, 245, 40

and the horizontal margins are

8 890 560, 156 800, 55 566

After an unsuccessful brute-force (and pseudo-annealed) attempt achieving a minimum error of 127, although using the prime factor decompositions of these 11 margins, I realised that some entries were known: e.g., 7 at (1,2), 5 at (3,2), and 7 at (7,3), and (much later) that the (huge) product value for the first column implied that each term in that column had to be the maximal possible value for the corresponding rows, except for 5 on row 7. This leads to the starting grid

```    [,1] [,2] [,3]
[1,]    7    7    6
[2,]    9    0    0
[3,]    9    5    3
[4,]    7    0    0
[5,]    8    0    0
[6,]    7    0    0
[7,]    5    7    7
[8,]    8    0    0
```

and an additional and obvious exclusion based on the absence of 3’s in the second column, of 5’s and 2’s in the third column shows there was a unique solution

```    [,1] [,2] [,3]
[1,]    7    7    6
[2,]    9    8    3
[3,]    9    5    3
[4,]    7    2    7
[5,]    8    2    7
[6,]    7    4    3
[7,]    5    7    7
[8,]    8    5    1
```

as also demonstrated by a complete exploration with R:

Try it online!

## dial e for Buffon

Posted in Books, Kids, Statistics with tags , , , , , , , on January 29, 2021 by xi'an

The use of Buffon’s needle to approximate π by a (slow) Monte Carlo estimate is a well-known Monte Carlo illustration. But that a similar experiment can be used for approximating e seems less known, if judging from the 08 January riddle from The Riddler. When considering a sequence of length n exchangeable random variables, the probability of a particuliar ordering of the sequence is 1/n!. Thus, counting how many darts need be thrown on a target until the distance to the centre increases produces a random number N≥2 with pmf 1/n!-1/(n+1)! and with expectation equal to e. Which can be checked as follows

```p=diff(c(0,1+which(diff(rt(1e5))>0)))
sum((p>1)*((p+1)*(p+2)/2-1)+2*(p==1))```

which recycles simulations by using every one as starting point (codegolfers welcome!).

An earlier post on the ‘Og essentially covered the same notion, also linking it to Forsythe’s method and to Gnedenko. (Rényi could also be involved!) Paradoxically, the extra-credit given to the case when the target is divided into equal distance tori is much less exciting…

## puzzles & riddles

Posted in Books, Kids, R, Statistics with tags , , , , , , , , on January 3, 2021 by xi'an

A rather simplistic game on the Riddler of 18 December:

…two players, each of whom starts with a whole number of points. Players take turns “attacking” each other, which involves subtracting their own number of points from their opponent’s until one of the players is out of points.

Easy to code in R:

```g=function(N,M)ifelse(N<M,-g(M-N,N),1)

f=function(N,M=N){
while(g(N,M)>0)M=M+1
return(M)}
```

which converges to the separating ratio 1.618. If decomposing the actions until one player wins, one gets a sequence of upper and lower bounds associated with the Fibonacci sequence: 1⁻, 2⁺, 3/2⁻, 5/3⁺, 8/5⁻, &tc, converging to the “golden ratio” φ.

As an aside, I also solved a relatively quick codegolf challenge, where the question was to find the sum of all possible binary values from a bitmask. Meaning that for a binary input, e.g., 101X0XX0…01X, with some entries masked by X’s, one had to find the sum of all binary numbers compatible with the input. Which can be solved succinctly by counting the number of X’s, k, and adding the visible bits $2^k$ times and replacing the invisible ones by  $2^{k-1}$. With some help, and using 2 instead of X, my R code moved from 158 bytes to 50:

```function(x)2^sum(a<-x>1)*rev(x/4^a)%*%2^(seq(x)-1)
```

## around the table

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on December 2, 2020 by xi'an

The Riddler has a variant on the classical (discrete) random walk around a circle where every state (but the starting point) has the same probability 1/(n-1) to be visited last. Surprising result that stems almost immediately from the property that, leaving from 0, state a is visited couterclockwise before state b>a is visited clockwise is b/a+b. The variant includes (or seems to include) the starting state 0 as counting for the last visit (as a return to the origin). In that case, all n states, including the origin, but the two neighbours of 0, 1, and n-1, have the same probability to be last. This can also be seen on an R code that approximates (inner loop) the probability that a given state is last visited and record how often this probability is largest (outer loop):

```w=0*(1:N)#frequency of most likely last
for(t in 1:1e6){
o=0*w#probabilities of being last
for(v in 1:1e6)#sample order of visits
o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),300,rep=T))%%N)[N]]+1
w[j]=w[j<-order(o)[N]]+1}
```

However, upon (jogging) reflection, the double loop is a waste of energy and

```o=0*(1:N)
for(v in 1:1e8)
o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),500,rep=T))%%N)[N]]+1
```

should be enough to check that all n positions but both neighbours have the same probability of being last visited. Removing the remaining loop should be feasible by considering all subchains starting at one of the 0’s, since this is a renewal state, but I cannot fathom how to code it succinctly. A more detailed coverage of the original problem (that is, omitting the starting point) was published the Monday after publication of the riddle on R bloggers, following a blog post by David Robinson on Variance Explained.

R codegolf challenge: is there a way to shorten the above R for loop in a single line command?!