## not a Bernoulli factory

Posted in Books, Kids, pictures, R with tags , , , , , , , on May 20, 2020 by xi'an

A Riddler riddle I possibly misunderstood:

Four isolated persons are given four fair coins, which can be either flipped once or returned without being flipped. If all flipped coins come up heads, the team wins! Else, if any comes up tails, or if no flip at all is done, it looses. Each person is further given an independent U(0,1) realisation. What is the best strategy?

Since the players are separated, I would presume the same procedure is used by all. Meaning that a coin is tossed with probability p, ie if the uniform is less than p, and untouched otherwise. The probability of winning is then

4(1-p)³p½+6(1-p)³p½²+4(1-p)p³½³+p⁴½⁴

which is maximum for p=0.3420391, with a winning probability of 0.2848424.

solve $$x⌊x⌊x⌊x⌋⌋⌋=2020$$

Where the integral part is the integer immediately below x. Puzzle that I first fail solving by brute force, because I did not look at negative x’s… Since the fourth root of 2020 is between 6 and 7, the solution is either x=6+ε or x=-7+ε, with ε in (0,1). The puzzle then becomes either

$$(6+ε)⌊(6+ε)⌊(6+ε)⌊6+ε⌋⌋⌋ = (6+ε)⌊(6+ε)⌊36+6ε⌋⌋ = (6+ε)⌊(6+ε)(36+⌊6ε⌋)⌋ = 2020$$

where there are 6 possible integer values for $$⌊6ε⌋, with only ⌊6ε⌋=5 being possible, turning the equation into$$

$$(6+ε)⌊41(6+ε)⌋ = (6+ε)(246+⌊41ε⌋) = 2020$$

where again only $$⌊42ε⌋$$=40 being possible, ending up with

$$1716+286ε = 2020$$

which has no solution in (0,1). In the second case

(-7+$$ε$$)$$⌊(-7+ε)⌊(-7+ε)⌊-7+ε⌋⌋⌋$$ = (-7+$$ε$$)$$⌊(-7+ε)(49+⌊-7ε⌋)⌋ = 2020$$

shows that only $$⌊-7ε⌋$$=-3 is possible, leading to

(-7+$$ε$$)$$⌊46(-7+ε))⌋$$ = (-7+$$ε$$) ($$-322+⌊46ε⌋)=2020$$

with only $$⌊46ε⌋$$=17 possible, hence

2135-305$$ε$$=2020

and

$$ε$$=115/305.

A brute force simulated annealing resolution returns x=-6.622706 after 10⁸ iterations. A more interesting question is to figure out the discontinuity points of the function

$$ℵ(x) = x$$$$⌊x⌊x⌊x⌋⌋⌋$$

as they seem to be numerous:

For instance, only 854 of the first 2020 integers enjoy a solution to $$ℵ(x)$$=n.

## Markov chain quasi-Monte Carlo

Posted in Statistics with tags , , , , on April 29, 2020 by xi'an

“It is known that Tausworthe generators can be viewed as polynomial Korobov lattice point sets with a denominator polynomial p(x) and a numerator polynomial q(x) over IF2

A recently arXived paper by Shin Harase, “A table of short-period Tausworthe generators for Markov chain quasi-Monte Carlo”, discusses the use of [quasi-Monte Carlo] Tausworthe generators rather than iid uniform sampling. As shown by Owen and Tribble, it is indeed legit to replace a sequence of iid (pseudo-random) uniforms with its quasi-Monte Carlo (qMC) version if the sequence keeps a sufficient degree of uniformity. The current paper optimises the parameters of the Tausworthe generators in terms of the t-value of the generator, an indicator of the uniform occupancy of the qMC sequence.

For a range of values of m, if 2m-1 is the period of the pseudo-random generator, the author obtains the optimal weights in the Tausworthe generator, which is a linear feedback shift register generator over {0,1}, ie shifting all the bits of the current uniform realisation by linear combination modulo 2. The comparison with other qMC and MC is provided on a Gibbs sampler for a bidimensional Gaussian target, which presents the advantage of requiring exactly one uniform per simulation and the disadvantage of … requiring exactly one uniform per simulation! Since this is harder to envision for simulation methods requiring a random number of uniforms.

Regarding the complexity of the approach, I do not see any gap between using these Tausworthe generators and something like the Mersenne generator. I just wonder at the choice of m, that is, whether or not it makes sense to pick any value lower than 2³² for the period.

## really random generators [again!]

Posted in Books, Statistics with tags , , , , , , , , , on March 2, 2020 by xi'an

A pointer sent me to Chemistry World and an article therein about “really random numbers“. Or “truly” random numbers. Or “exactly” random numbers. Not particularly different from the (in)famous lava lamp generator!

“Cronin’s team has developed a robot that can automatically grow crystals in a 10 by 10 array of vials, take photographs of them, and use measurements of their size, orientation, and colour to generate strings of random numbers. The researchers analysed the numbers generated from crystals grown in three solutions – including a solution of copper sulfate – and found that they all passed statistical tests for the quality of their randomness.” Chemistry World, 18 February 2020

The validation of this truly random generator is thus exactly the same as a (“bad”) pseudo-random generator, namely that in the law of large number sense, it fits the predicted behaviour. And thus the difference between them cannot be statistical, but rather cryptographic:

“…we considered the encryption capability of this random number generator versus that of a frequently used pseudorandom number generator, the Mersenne Twister.” Lee et al., Matter, February 10, 2020

Meaning that the knowledge of the starting point and of the deterministic transform for the Mersenne Twister makes it feasible to decipher, which is not the case for a physical and non-reproducible generator as the one advocated. One unclear aspect of the proposed generator is the time required to produce 10⁶, even though the authors mention that “the bit-generation rate is significantly lower than that in other methods”.

## biased sample!

Posted in Statistics with tags , , , , , , , , , , , on May 21, 2019 by xi'an

A chance occurrence led me to this thread on R-devel about R sample function generating a bias by taking the integer part of the continuous uniform generator… And then to the note by Kellie Ottoboni and Philip Stark analysing the reason, namely the fact that R uniform [0,1) pseudo-random generator is not perfectly continuously uniform but discrete, by the nature of numbers on a computer. Knuth (1997) showed that in this case the range of probabilities is larger than (1,1), the largest range being (1,1.03). As noted in the note, exploiting directly the pseudo-random bits of the pseudo-random generator. Shocking, isn’t it!  A fast and bias-free alternative suggested by Lemire is available as dqsample::sample

As an update of June 2019, sample is now fixed.

## fiducial simulation

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on April 19, 2018 by xi'an

While reading Confidence, Likelihood, Probability), by Tore Schweder and Nils Hjort, in the train from Oxford to Warwick, I came upon this unexpected property shown by Lindqvist and Taraldsen (Biometrika, 2005) that to simulate a sample y conditional on the realisation of a sufficient statistic, T(y)=t⁰, it is sufficient (!!!) to simulate the components of  y as y=G(u,θ), with u a random variable with fixed distribution, e.g., a U(0,1), and to solve in θ the fixed point equation T(y)=t⁰. Assuming there exists a single solution. Brilliant (like an aurora borealis)! To borrow a simple example from the authors, take an exponential sample to be simulated given the sum statistics. As it is well-known, the conditional distribution is then a (rescaled) Beta and the proposed algorithm ends up being a standard Beta generator. For the method to work in general, T(y) must factorise through a function of the u’s, a so-called pivotal condition which brings us back to my post title. If this condition does not hold, the authors once again brilliantly introduce a pseudo-prior distribution on the parameter θ to make it independent from the u’s conditional on T(y)=t⁰. And discuss the choice of the Jeffreys prior as optimal in this setting even when this prior is improper. While the setting is necessarily one of exponential families and of sufficient conditioning statistics, I find it amazing that this property is not more well-known [at least by me!]. And wonder if there is an equivalent outside exponential families, for instance for simulating a t sample conditional on the average of this sample.