## back to the Bernoulli factory

Posted in Books, Statistics, University life with tags , , , on April 7, 2020 by xi'an

“The results show that the proposed algorithm is asymptotically optimal for the mentioned subclass of functions, in the sense that for any other fast algorithm E[N] grows at least as fast with p.”

Murray Pollock (Warwick U. for a wee more days!) pointed out to me this paper of Luis Mendo on a Bernoulli factory algorithm that estimates functions [of p] that can be expressed as power series [of p]. Essentially functions f(p) such that f(0)=0 and f(1)=1. The big difference with earlier algorithms, as far as I can tell, is that the approach involves a randomised stopping rule that involves, on top of the unlimited sequence of Bernoulli B(p) variates a second sequence of Uniform variates, which sounds to me like a change of paradigm, given the much higher degree of freedom brought by Uniform variates (as opposed to Bernoulli variates with an unknown value of p). Although there exists a non-randomised version in the paper. The proposed algorithm is as follows, using a sequence of d’s issued from the power series coefficients:

1. Set i=1.
2. Take one input X[i].
3. Produce U[i] uniform on (0,1). Let V[i]=1 if U[i]<d[i] and V[i]=0 otherwise.
If V[i] or X[i] are equal to 1, output X[i] and finish.
Else increase i and go back to step2.

As the author mentions, this happens to be a particular case of the reverse-time martingale approach of Łatuszynski, Kosmidis, Papaspiliopoulos and Roberts (Warwick connection as well!). With an average number of steps equal to f(p)/p, surprisingly simple, and somewhat of an optimal rate. While the functions f(p) are somewhat restricted, this is nice work

## double descent

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on November 7, 2019 by xi'an Last Friday, I [and a few hundred others!] went to the SMILE (Statistical Machine Learning in Paris) seminar where Francis Bach was giving a talk. (With a pleasant ride from Dauphine along the Seine river.) Fancis was talking about the double descent phenomenon observed in recent papers by Belkin & al. (2018, 2019), and Mei & Montanari (2019). (As the seminar room at INRIA was quite crowded and as I was sitting X-legged on the floor close to the screen, I took a few slides from below!) The phenomenon is that the usual U curve warning about over-fitting and reproduced in most statistics and machine-learning courses can under the right circumstances be followed by a second decrease in the testing error when the number of features goes beyond the number of observations. This is rather puzzling and counter-intuitive, so I briefkly checked the 2019 [8 pages] article by Belkin & al., who are studying two examples, including a standard “large p small n” Gaussian regression. where the authors state that

“However, as p grows beyond n, the test risk again decreases, provided that the model is fit using a suitable inductive bias (e.g., least norm solution). “

One explanation [I found after checking the paper] is that the variates (features) in the regression are selected at random rather than in an optimal sequential order. Double descent is missing with interpolating and deterministic estimators. Hence requiring on principle all candidate variates to be included to achieve minimal averaged error. The infinite spike is when the number p of variate is near the number n of observations. (The expectation accounts as well for the randomisation in T. Randomisation that remains an unclear feature in this framework…) ## O’Bayes in action

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on November 7, 2017 by xi'an My next-door colleague [at Dauphine] François Simenhaus shared a paradox [to be developed in an incoming test!] with Julien Stoehr and I last week, namely that, when selecting the largest number between a [observed] and b [unobserved], drawing a random boundary on a [meaning that a is chosen iff a is larger than this boundary] increases the probability to pick the largest number above ½2…

When thinking about it in the wretched RER train [train that got immobilised for at least two hours just a few minutes after I went through!, good luck to the passengers travelling to the airport…] to De Gaulle airport, I lost the argument: if a<b, the probability [for this random bound] to be larger than a and hence for selecting b is 1-Φ(a), while, if a>b, the probability [of winning] is Φ(a). Hence the only case when the probability is ½ is when a is the median of this random variable. But, when discussing the issue further with Julien, I exposed an interesting non-informative prior characterisation. Namely, if I assume a,b to be iid U(0,M) and set an improper prior 1/M on M, the conditional probability that b>a given a is ½. Furthermore, the posterior probability to pick the right [largest] number with François’s randomised rule is also ½, no matter what the distribution of the random boundary is. Now, the most surprising feature of this coffee room derivation is that these properties only hold for the prior 1/M. Any other power of M will induce an asymmetry between a and b. (The same properties hold when a,b are iid Exp(M).)  Of course, this is not absolutely unexpected since 1/M is the invariant prior and since the “intuitive” symmetry only holds under this prior. Power to O’Bayes!

When discussing again the matter with François yesterday, I realised I had changed his wording of the puzzle. The original setting is one with two cards hiding the unknown numbers a and b and of a player picking one of the cards. If the player picks a card at random, there is indeed a probability of ½ of picking the largest number. If the decision to switch or not depends on an independent random draw being larger or smaller than the number on the observed card, the probability to get max(a,b) in the end hits 1 when this random draw falls into (a,b) and remains ½ outside (a,b). Randomisation pays.