## Bernoulli factory on a budget

Posted in Books, Kids with tags , , on May 23, 2023 by xi'an

A form of Bernoulli factory with limited energy from The Riddler: being given the choice of 0<p<1, what is the minimal number n of throws such that the outcomes of n iid B(p) draws can be partitioned into six groups of equal probability? Running a brute force R code [in the train to Cambridge] and checking random partitions for different values of p and n, I found a near fit for n=6 and p around 0.42. With partition weights

0.1658159 0.1669405 0.1657981 0.1662112 0.1682599 0.1669745

while missing the n=5 case for p≈0.3034.

## grasshoppers for pythons

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , on May 18, 2023 by xi'an

Following my earlier post on the terrible performances of ChatGPT for a grasshopping riddle, Bob Carpenter sent me his interaction with GPT4, using the same entry. Here is the python code outcome he obtained. The code is running  and the answer is correct.

BC: Thanks. Can you write a python program to simulate the final position of the grasshopper to within machine tolerance using single-precision floating point arithmetic?

GPT4: Sure, here’s a simple Python program that simulates the grasshopper’s journey using single-precision floating point arithmetic, specifically the float32 data type from the numpy library. It continues until the jump size is less than the machine epsilon for float32, which is the smallest representable positive number such that 1.0 + eps != 1.0.

import numpy as np
import random

# Initialize position and jump size
position = np.float32(0.0)
jump = np.float32(0.5)

# Get machine epsilon for float32
eps = np.finfo(np.float32).eps

# Perform jumps until jump size is less than machine epsilon
while jump > eps:
# Choose direction randomly
direction = random.choice([-1, 1])
# Update position
position += direction * jump
# Halve the jump size
jump /= 2

Remember that this program only simulates a single instance of the grasshopper’s journey. If you ran it multiple times, you would get different results each time due to the randomness in the choice of direction. However, on average, you should expect the final positions to be around 0, as per our mathematical analysis earlier. The absolute distance from the origin, however, will vary and, on average, should be close to 1/2.
print(“Final position:”, position)

BC: can you write a program to calculate the expected length of the final position from the origin?

import numpy as np
import random

# Define the number of simulations
num_simulations = 100000

# Initialize an array to store the final positions
final_positions = np.zeros(num_simulations, dtype=np.float32)

# Perform each simulation
for i in range(num_simulations):
# Initialize position and jump size
position = np.float32(0.0)
jump = np.float32(0.5)

# Get machine epsilon for float32
eps = np.finfo(np.float32).eps

# Perform jumps until jump size is less than machine epsilon
while jump > eps:
# Choose direction randomly
direction = random.choice([-1, 1])
# Update position
position += direction * jump
# Halve the jump size
jump /= 2

# Store the final position
final_positions[i] = position

# Calculate the expected length of the final position from the origin
expected_length = np.mean(np.abs(final_positions))

print(“Expected length of the final position from the origin:”, expected_length)

## operation precisely impossible

Posted in Books, Kids, R, University life with tags , , , , , , , , , , , , on May 13, 2023 by xi'an

Since the solution to the previous riddle from The Riddler on the maximum of  different terms in the composed operation

a∅b∅c∅d∅e∅f

depending on the bracketing ordering and the meaning of each ∅ among one of the six elementary operations got posted today as 974,860, I got back to my R code to understand why it differed from my figures by two orders of magnitude and realised I was overly trusting the R function unique. As it was returning more “different” entries than it should have, especially when choosing the six starting numbers (a,…,f) as Uniform (0,1). Using integers instead led for instance to 946,558, which was not so far from the target. But still imprecise as to whether or not some entries had been counted several times. I mentioned the issue to Robin, who rose to the challenge and within minutes came up with using the R function almost.unique from the CRAN package bazar, then producing outcomes like 974,513, hence quite close to 974,860 for random initialisations!

## operation impossible

Posted in Books, R with tags , , , , , , on April 30, 2023 by xi'an

A riddle from The Riddler on how many different numbers one could at most produce from six initial values and the four basic operations. In other words, how many values could the terms in

a∅(b∅{c∅[d∅(e∅f)]})

and other positioning of the brackets could take? (With each ∅ being one of the four operations and a,…,f the initial values or a permutation of these.) A very crude evaluation leads to twenty million possible values, forgetting that addition and multiplication are commutative, while subtraction and division are anti-commutative. I tried a brute force approach in R, rather than exploring the tree of possible paths, but could not approach this figure by far, the number of different values still increasing for the largest manageable number of replicas I could try. Reducing the initial values at n=3, I could come closer to 123 with 95 different values and, for n=4, not too far from 1972 with 1687 values. Moving to n=6 by hopefully exhausting all (?) entries led to another guess of 50,524,809 values. I am however unsure that the R code is able to spot identical values, given the variability of the above figure…

## simulated annealing and logistic regression to the max

Posted in Kids, R, Statistics with tags , , , , , , on April 26, 2023 by xi'an

A Riddler puzzle on the three binary and sequential questions one should ask three players hiding their respective U(0,1) realisation, U, V, and W, to best guess which player holds the largest number, max{U,V,W}. Assuming questions of the type Is U<b¹, &tc., the challenge boils down to selecting seven bounds (one for U, two for V, and four for W) in order to optimise the probability of picking the right player. Which means for a given collection of such bounds to learn this probability from the three binary answers. These can be turned into eight (2³) binary variables and I used them as entries in a logistic regression to predict that W was larger than max(U,V), itself predicted by the first two answers. The optimisation of the bounds can then be achieved by simulated annealing (or otherwise) and the approach returns (random) outputs like the following bounds (one on U, two on V, and four on W)

`0.616 0.434 0.830 0.350 0.736 0.913 0.796 0.827`

for an estimated probability of 0.827. This is a somewhat coherent sequence of bounds when considering the simpler case of two players. Indeed, with three bounds, the probability of winning can be readily derived as

$b^2(b^1-b^2)+1/2+b^3(1-b^2+b^1)-b^1b^1$

logically optimised by (b¹,b²,b³)=(1/2,1/4,3/4) for a success probability of 0.875. And it coïncides with the solution posted by The Riddler, although there is no intuition behind the figures, contrary to the two player situation. In fact, I am surprised that the bound on W does not equate the expectation of max{U,V} under the current conditions:

```> x=runif(1e6,0,.616);y=runif(1e6,0,.434)
> mean(y+(x-y>0)*(x-y))
[1] 0.3591648
```