## back to the Bernoulli factory

Posted in Books, Statistics, University life with tags , , , on April 7, 2020 by xi'an

“The results show that the proposed algorithm is asymptotically optimal for the mentioned subclass of functions, in the sense that for any other fast algorithm E[N] grows at least as fast with p.”

Murray Pollock (Warwick U. for a wee more days!) pointed out to me this paper of Luis Mendo on a Bernoulli factory algorithm that estimates functions [of p] that can be expressed as power series [of p]. Essentially functions f(p) such that f(0)=0 and f(1)=1. The big difference with earlier algorithms, as far as I can tell, is that the approach involves a randomised stopping rule that involves, on top of the unlimited sequence of Bernoulli B(p) variates a second sequence of Uniform variates, which sounds to me like a change of paradigm, given the much higher degree of freedom brought by Uniform variates (as opposed to Bernoulli variates with an unknown value of p). Although there exists a non-randomised version in the paper. The proposed algorithm is as follows, using a sequence of d’s issued from the power series coefficients:

1. Set i=1.
2. Take one input X[i].
3. Produce U[i] uniform on (0,1). Let V[i]=1 if U[i]<d[i] and V[i]=0 otherwise.
If V[i] or X[i] are equal to 1, output X[i] and finish.
Else increase i and go back to step2.

As the author mentions, this happens to be a particular case of the reverse-time martingale approach of Łatuszynski, Kosmidis, Papaspiliopoulos and Roberts (Warwick connection as well!). With an average number of steps equal to f(p)/p, surprisingly simple, and somewhat of an optimal rate. While the functions f(p) are somewhat restricted, this is nice work

## revisiting the Gelman-Rubin diagnostic

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on January 23, 2019 by xi'an

Just before Xmas, Dootika Vats (Warwick) and Christina Knudson arXived a paper on a re-evaluation of the ultra-popular 1992 Gelman and Rubin MCMC convergence diagnostic. Which compares within-variance and between-variance on parallel chains started from hopefully dispersed initial values. Or equivalently an under-estimating and an over-estimating estimate of the MCMC average. In this paper, the authors take advantage of the variance estimators developed by Galin Jones, James Flegal, Dootika Vats and co-authors, which are batch mean estimators consistently estimating the asymptotic variance. They also discuss the choice of a cut-off on the ratio R of variance estimates, i.e., how close to one need it be? By relating R to the effective sample size (for which we also have reservations), which gives another way of calibrating the cut-off. The main conclusion of the study is that the recommended 1.1 bound is too large for a reasonable proximity to the true value of the Bayes estimator (Disclaimer: The above ABCruise header is unrelated with the paper, apart from its use of the Titanic dataset!)

In fact, I have other difficulties than setting the cut-off point with the original scheme as a way to assess MCMC convergence or lack thereof, among which

1. its dependence on the parameterisation of the chain and on the estimation of a specific target function
2. its dependence on the starting distribution which makes the time to convergence not absolutely meaningful
3. the confusion between getting to stationarity and exploring the whole target
4. its missing the option to resort to subsampling schemes to attain pseudo-independence or scale time to convergence (albeit see 3. above)
5. a potential bias brought by the stopping rule.

## a funny mistake

Posted in Statistics with tags , , , , , , , , , , , on August 20, 2018 by xi'an

While watching the early morning activity in Tofino inlet from my rental desk, I was looking at a recent fivethirthyeight Riddle, which consisted in finding the probability of stopping a coin game which rule was to wait for the n consecutive heads if (n-1) consecutive heads had failed to happen when requested, which is

p+(1-p)p²+(1-p)(1-p²)p³+…

or

$q=\sum_{k=1}^\infty p^k \prod_{j=1}^{k-1}(1-p^j)$

While the above can write as

$q=\sum_{k=1}^\infty \{1-(1-p^k)\} \prod_{j=1}^{k-1}(1-p^j)$

or

$\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j)-\prod_{j=1}^{k}(1-p^j)$

hence suggesting

$q=\sum_{k=1}^\infty \prod_{j=1}^{k-1}(1-p^j) - \sum_{k=2}^\infty \prod_{j=1}^{k-1}(1-p^j) =1$

the answer is (obviously) false and the mistake in separating the series into a difference of series is that both terms are infinite. The correct answer is actually

$q=1-\prod_{j=1}^{\infty}(1-p^j)$

which is Euler’s function. Maybe nonstandard analysis can apply to go directly from the difference of the infinite series to the answer!

## Bayesian decision riddle

Posted in Books, Kids, Statistics with tags , , , , on June 15, 2017 by xi'an

The current puzzle on The Riddler is a version of the secretary problem with an interesting (?) Bayesian solution.

Given four positive numbers x¹, x², x³, x⁴, observed sequentially, the associated utility is the value of x at the stopping time. What is the optimal stopping rule?

While nothing is mentioned about the distribution of the x’s, I made the assumption that they were iid and uniformly distributed over (0,M), with M unknown and tried a Bayesian resolution with the non-informative prior π(M)=1/M. And failed. The reason for this failure is that the expected utility is infinite at the first step: while the posterior expected utility is finite with three and two observations, meaning I can compare stopping and continuing at the second and third steps, the predicted expected reward for continuing after observing x¹ does not exist because the expected value of max(x¹,x²) given x¹ does not exist. As the predictive density of x² is max(x¹,x²)⁻²…  Several alternatives are possible to bypass this impossible resolution, from changing the utility function to picking another reference prior.

For instance, using a prior like π(M)=1/M² l(and the same monetary return utility) leads to a proper optimal solution, namely

1. always wait for the second observation x²
2. stop at x² if x²>11x¹/12, else wait for x³
3. stop at x³ if x³>23 max(x¹,x²)/24, else observe x⁴

obtained analytically on a bar table in Rouen (and checked numerically later).

Another approach is to try to optimise the probability to pick the largest amount of the four x’s, but this is not leading to an interesting solution, since it corresponds to picking the first maximum after x¹, while picking the largest among remaining ones leads to a somewhat convoluted solution I have no patience to produce here! Plus this is not a really pertinent loss function as it does not discriminate enough against waiting…

## a secretary problem with maximum ability

Posted in Kids, R with tags , , , , on April 28, 2017 by xi'an

The Riddler of today has a secretary problem, where one measures sequentially N random variables until one deems the current variable to be the largest of the whole sample. The classical secretary problem has a counter-intuitive solution where one first measures N/e random variables without taking any decision and then and only then picks the first next outcome larger than the largest in the first group. (For large values of N.) The added information in the current riddle is that the distribution of those iid random variables is set to be uniform on {1,…,M}, which begs for a modification in the algorithm. As for instance when observing M on the current draw.

The approach I devised is certainly suboptimal, as I decided to pick the currently observed value if the (conditional) probability it is the largest is larger than the probability subsequent draws. This translates into the following R code:

M=100 #maximum value
N=10  #total number of draws
hyprob=function(m){
# m is sequence of draws so far
n=length(m);mmax=max(m)
if ((m[n]<mmax)||(mmax-n<N-n)){prob=0
}else{
prob=prod(sort((1:mmax)[-m],dec=TRUE)
[1:(N-n)]/((M-n):(M-N+1))}
return(prob)}

decision=function(draz=sample(1:M,N)){
i=0
keepgoin=TRUE
while ((keepgoin)&(i<N)){
i=i+1
keepgoin=(hyprob(draz[1:i])<0.5)}
return(c(i,draz[i],(draz[i]<max(draz))))}


which produces a winning rate of around 62% when N=10 and M=100, hence much better than the expected performances of the (asymptotic) secretary algorithm, with a winning frequency of 1/e. (For N=10 and M=100, the winning frequency is only 27%.)