## Bernoulli factory on a budget

Posted in Books, Kids with tags , , on May 23, 2023 by xi'an

A form of Bernoulli factory with limited energy from The Riddler: being given the choice of 0<p<1, what is the minimal number n of throws such that the outcomes of n iid B(p) draws can be partitioned into six groups of equal probability? Running a brute force R code [in the train to Cambridge] and checking random partitions for different values of p and n, I found a near fit for n=6 and p around 0.42. With partition weights

0.1658159 0.1669405 0.1657981 0.1662112 0.1682599 0.1669745

while missing the n=5 case for p≈0.3034.

## grasshoppers for pythons

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , on May 18, 2023 by xi'an

Following my earlier post on the terrible performances of ChatGPT for a grasshopping riddle, Bob Carpenter sent me his interaction with GPT4, using the same entry. Here is the python code outcome he obtained. The code is running  and the answer is correct.

BC: Thanks. Can you write a python program to simulate the final position of the grasshopper to within machine tolerance using single-precision floating point arithmetic?

GPT4: Sure, here’s a simple Python program that simulates the grasshopper’s journey using single-precision floating point arithmetic, specifically the float32 data type from the numpy library. It continues until the jump size is less than the machine epsilon for float32, which is the smallest representable positive number such that 1.0 + eps != 1.0.

import numpy as np
import random

# Initialize position and jump size
position = np.float32(0.0)
jump = np.float32(0.5)

# Get machine epsilon for float32
eps = np.finfo(np.float32).eps

# Perform jumps until jump size is less than machine epsilon
while jump > eps:
# Choose direction randomly
direction = random.choice([-1, 1])
# Update position
position += direction * jump
# Halve the jump size
jump /= 2

Remember that this program only simulates a single instance of the grasshopper’s journey. If you ran it multiple times, you would get different results each time due to the randomness in the choice of direction. However, on average, you should expect the final positions to be around 0, as per our mathematical analysis earlier. The absolute distance from the origin, however, will vary and, on average, should be close to 1/2.
print(“Final position:”, position)

BC: can you write a program to calculate the expected length of the final position from the origin?

import numpy as np
import random

# Define the number of simulations
num_simulations = 100000

# Initialize an array to store the final positions
final_positions = np.zeros(num_simulations, dtype=np.float32)

# Perform each simulation
for i in range(num_simulations):
# Initialize position and jump size
position = np.float32(0.0)
jump = np.float32(0.5)

# Get machine epsilon for float32
eps = np.finfo(np.float32).eps

# Perform jumps until jump size is less than machine epsilon
while jump > eps:
# Choose direction randomly
direction = random.choice([-1, 1])
# Update position
position += direction * jump
# Halve the jump size
jump /= 2

# Store the final position
final_positions[i] = position

# Calculate the expected length of the final position from the origin
expected_length = np.mean(np.abs(final_positions))

print(“Expected length of the final position from the origin:”, expected_length)

## more [grass]hope

Posted in Books, Kids, pictures, Statistics with tags , , , , on May 16, 2023 by xi'an

Following my earlier post on the terrible performances of ChatGPT for a grasshopping riddle, Bob Carpenter sent me his interaction with GPT4, using the same entry. Here is the outcome he obtained. The answer is correct, but the reasoning is wrong.

GPT4: This is a classic problem of infinite series and expected values in probability theory.

Given the nature of the grasshopper’s random walk, at each jump, it has an equal probability (1/2) of hopping to the left or to the right. Thus, we expect that, on average, it won’t be moving away from the origin (0) because the positive and negative jumps should cancel out.

However, you are asking for the expected absolute distance from the origin, which can’t be negative by definition. The key to solving this problem is to recognize that the expected value of the absolute value of a random variable is not the same as the absolute value of its expected value.

Since the absolute value function is not linear, we cannot pull the expected value inside of it. Instead, we need to directly compute the expected value of the absolute distance from the origin, which is a sum of an infinite series.

The nth jump’s absolute distance is 1/(2^n), and it has a probability of 1/2 to go either left or right. Therefore, the expected absolute distance from the origin after the nth jump is (1/2) * 1/(2^n).

Adding up these expected distances for all jumps gives us the total expected absolute distance from the origin:

E = (1/2) * (1/2 + 1/4 + 1/8 + 1/16 + …)

The sum inside the parentheses is a geometric series with a = 1/2 and r = 1/2. The sum of an infinite geometric series is a / (1 – r), so the sum of this series is (1/2) / (1 – 1/2) = 1.

Therefore, E = (1/2) * 1 = 1/2

So on average, the grasshopper is expected to be 1/2 unit away from the origin after infinitely many jumps. The expected distance it travels to return to the origin is thus 1/2 unit.

## grass hopping on the wrong fgpt

Posted in Statistics with tags , , on May 14, 2023 by xi'an

Checking the R coding abilities of ChatGPT, I entered the latest Riddler puzzle that asks for the expected value of the stationary distribution of a slowing-down random walk over the real line when the N-th jump from x is to x±2-N. Which is ½ since the limiting distribution is uniform over (-1,1). The first proposed solution kept returning 0 due to a poor initialisation (and a wrong understanding of the puzzle):

The second attempt does not run (because there is no exact zero in the cumulated distance)and does not make sense (because the random walk has no stopping rule).  The third attempt runs (!) but again misses the point by adding the absolute jump values until a return to zero. And even pointing out the mistake to ChatGPT does not result into a fourth correct solution

## operation precisely impossible

Posted in Books, Kids, R, University life with tags , , , , , , , , , , , , on May 13, 2023 by xi'an

Since the solution to the previous riddle from The Riddler on the maximum of  different terms in the composed operation

a∅b∅c∅d∅e∅f

depending on the bracketing ordering and the meaning of each ∅ among one of the six elementary operations got posted today as 974,860, I got back to my R code to understand why it differed from my figures by two orders of magnitude and realised I was overly trusting the R function unique. As it was returning more “different” entries than it should have, especially when choosing the six starting numbers (a,…,f) as Uniform (0,1). Using integers instead led for instance to 946,558, which was not so far from the target. But still imprecise as to whether or not some entries had been counted several times. I mentioned the issue to Robin, who rose to the challenge and within minutes came up with using the R function almost.unique from the CRAN package bazar, then producing outcomes like 974,513, hence quite close to 974,860 for random initialisations!