Archive for the R Category

grasshoppers for pythons

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , on May 18, 2023 by xi'an

Following my earlier post on the terrible performances of ChatGPT for a grasshopping riddle, Bob Carpenter sent me his interaction with GPT4, using the same entry. Here is the python code outcome he obtained. The code is running  and the answer is correct.

BC: Thanks. Can you write a python program to simulate the final position of the grasshopper to within machine tolerance using single-precision floating point arithmetic?

GPT4: Sure, here’s a simple Python program that simulates the grasshopper’s journey using single-precision floating point arithmetic, specifically the float32 data type from the numpy library. It continues until the jump size is less than the machine epsilon for float32, which is the smallest representable positive number such that 1.0 + eps != 1.0.

import numpy as np
import random

# Initialize position and jump size
position = np.float32(0.0)
jump = np.float32(0.5)

# Get machine epsilon for float32
eps = np.finfo(np.float32).eps

# Perform jumps until jump size is less than machine epsilon
while jump > eps:
# Choose direction randomly
direction = random.choice([-1, 1])
# Update position
position += direction * jump
# Halve the jump size
jump /= 2

Remember that this program only simulates a single instance of the grasshopper’s journey. If you ran it multiple times, you would get different results each time due to the randomness in the choice of direction. However, on average, you should expect the final positions to be around 0, as per our mathematical analysis earlier. The absolute distance from the origin, however, will vary and, on average, should be close to 1/2.
print(“Final position:”, position)

BC: can you write a program to calculate the expected length of the final position from the origin?

import numpy as np
import random

# Define the number of simulations
num_simulations = 100000

# Initialize an array to store the final positions
final_positions = np.zeros(num_simulations, dtype=np.float32)

# Perform each simulation
for i in range(num_simulations):
# Initialize position and jump size
position = np.float32(0.0)
jump = np.float32(0.5)

# Get machine epsilon for float32
eps = np.finfo(np.float32).eps

# Perform jumps until jump size is less than machine epsilon
while jump > eps:
# Choose direction randomly
direction = random.choice([-1, 1])
# Update position
position += direction * jump
# Halve the jump size
jump /= 2

# Store the final position
final_positions[i] = position

# Calculate the expected length of the final position from the origin
expected_length = np.mean(np.abs(final_positions))

print(“Expected length of the final position from the origin:”, expected_length)

operation precisely impossible

Posted in Books, Kids, R, University life with tags , , , , , , , , , , , , on May 13, 2023 by xi'an

Since the solution to the previous riddle from The Riddler on the maximum of  different terms in the composed operation

a∅b∅c∅d∅e∅f

depending on the bracketing ordering and the meaning of each ∅ among one of the six elementary operations got posted today as 974,860, I got back to my R code to understand why it differed from my figures by two orders of magnitude and realised I was overly trusting the R function unique. As it was returning more “different” entries than it should have, especially when choosing the six starting numbers (a,…,f) as Uniform (0,1). Using integers instead led for instance to 946,558, which was not so far from the target. But still imprecise as to whether or not some entries had been counted several times. I mentioned the issue to Robin, who rose to the challenge and within minutes came up with using the R function almost.unique from the CRAN package bazar, then producing outcomes like 974,513, hence quite close to 974,860 for random initialisations!

operation impossible

Posted in Books, R with tags , , , , , , on April 30, 2023 by xi'an

A riddle from The Riddler on how many different numbers one could at most produce from six initial values and the four basic operations. In other words, how many values could the terms in

a∅(b∅{c∅[d∅(e∅f)]})

and other positioning of the brackets could take? (With each ∅ being one of the four operations and a,…,f the initial values or a permutation of these.) A very crude evaluation leads to twenty million possible values, forgetting that addition and multiplication are commutative, while subtraction and division are anti-commutative. I tried a brute force approach in R, rather than exploring the tree of possible paths, but could not approach this figure by far, the number of different values still increasing for the largest manageable number of replicas I could try. Reducing the initial values at n=3, I could come closer to 123 with 95 different values and, for n=4, not too far from 1972 with 1687 values. Moving to n=6 by hopefully exhausting all (?) entries led to another guess of 50,524,809 values. I am however unsure that the R code is able to spot identical values, given the variability of the above figure…

simulated annealing and logistic regression to the max

Posted in Kids, R, Statistics with tags , , , , , , on April 26, 2023 by xi'an

A Riddler puzzle on the three binary and sequential questions one should ask three players hiding their respective U(0,1) realisation, U, V, and W, to best guess which player holds the largest number, max{U,V,W}. Assuming questions of the type Is U<b¹, &tc., the challenge boils down to selecting seven bounds (one for U, two for V, and four for W) in order to optimise the probability of picking the right player. Which means for a given collection of such bounds to learn this probability from the three binary answers. These can be turned into eight (2³) binary variables and I used them as entries in a logistic regression to predict that W was larger than max(U,V), itself predicted by the first two answers. The optimisation of the bounds can then be achieved by simulated annealing (or otherwise) and the approach returns (random) outputs like the following bounds (one on U, two on V, and four on W)

0.616 0.434 0.830 0.350 0.736 0.913 0.796 0.827

for an estimated probability of 0.827. This is a somewhat coherent sequence of bounds when considering the simpler case of two players. Indeed, with three bounds, the probability of winning can be readily derived as

b^2(b^1-b^2)+1/2+b^3(1-b^2+b^1)-b^1b^1

logically optimised by (b¹,b²,b³)=(1/2,1/4,3/4) for a success probability of 0.875. And it coïncides with the solution posted by The Riddler, although there is no intuition behind the figures, contrary to the two player situation. In fact, I am surprised that the bound on W does not equate the expectation of max{U,V} under the current conditions:

> x=runif(1e6,0,.616);y=runif(1e6,0,.434)
> mean(y+(x-y>0)*(x-y))
[1] 0.3591648

powering a probability [a Bernoulli factory tale]

Posted in Books, R, Statistics with tags , , on April 21, 2023 by xi'an

Starting from an X validated question on finding an unbiased estimator of an integral raised to a non-integer power, I came across a somewhat interesting Bernoulli factory solution! Thanks to Peter Occil’s encyclopedic record of cases, pointing out to Mendo’s (2019) solution for functions of ρ that can be expressed as power series. Like ργ since

(1-[1-\rho])^\gamma=1+\gamma(1-\rho)+\frac{\gamma(\gamma-1)(1-\rho)^2}{2}+\cdots

which rather magically turns into the reported algorithm

Set k=1
Repeat the following process, until it returns a value x:
 1. Generate a Bernoulli B(ϱ) variate z; if z=1, return x=1
 2. Else, with probability γ/k, return x=0
 3. Else, set k=k+1 and return to 1.

since

\rho^\gamma=\rho+(1-\rho)(1-\gamma)\big\{\rho+\frac{(1-\rho)(2-\gamma)}{2}\big[\rho+\cdots

%d bloggers like this: