optimal Bernoulli factory

Posted in Statistics with tags , , , , , , , , , , on January 17, 2017 by xi'an

One of the last arXivals of the year was this paper by Luis Mendo on an optimal algorithm for Bernoulli factory (or Lovàsz‘s or yet Basu‘s) problems, i.e., for producing an unbiased estimate of f(p), 0<p<1, from an unrestricted number of Bernoulli trials with probability p of heads. (See, e.g., Mark Huber’s recent book for background.) This paper drove me to read an older 1999 unpublished document by Wästlund, unpublished because of the overlap with Keane and O’Brien (1994). One interesting gem in this document is that Wästlund produces a Bernoulli factory for the function f(p)=√p, which is not of considerable interest per se, but which was proposed to me as a puzzle by Professor Sinha during my visit to the Department of Statistics at the University of Calcutta. Based on his 1979 paper with P.K. Banerjee. The algorithm is based on a stopping rule N: throw a fair coin until the number of heads n+1 is greater than the number of tails n. The event N=2n+1 occurs with probability

${2n \choose n} \big/ 2^{2n+1}$

[Using a biased coin with probability p to simulate a fair coin is straightforward.] Then flip the original coin n+1 times and produce a result of 1 if at least one toss gives heads. This happens with probability √p.

Mendo generalises Wästlund‘s algorithm to functions expressed as a power series in (1-p)

$f(p)=1-\sum_{i=1}^\infty c_i(1-p)^i$

with the sum of the weights being equal to one. This means proceeding through Bernoulli B(p) generations until one realisation is one or a probability

$c_i\big/1-\sum_{j=1}^{i-1}c_j$

event occurs [which can be derived from a Bernoulli B(p) sequence]. Furthermore, this version achieves asymptotic optimality in the number of tosses, thanks to a form of Cramer-Rao lower bound. (Which makes yet another connection with Kolkata!)

a new Editor for Series B

Posted in Statistics with tags , , , on January 16, 2017 by xi'an

As every odd year, the Royal Statistical Society is seeking a new joint editor for Series B! After four years of dedication to the (The!) journal, Piotr Fryzlewicz is indeed going to retire from this duty by the end of 2017. Many thanks to Piotr for his unfailing involvement in Series B and the preservation of its uncompromising selection of papers! The call thus open for candidates for the next round of editorship, from 2018 to 2021, with a deadline of 31 January, 2017. Interested candidates should contact Martin Owen, at the Society’s address or by email at rss.org.uk with journal as recipient (local-part). The new editor will work with the current joint editor, David Dunson, whose term runs till December 2019. (I am also looking forward working with Piotr’s successor in developing the Series B blog, Series’ Blog!)

la maison des mathématiques

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , on January 14, 2017 by xi'an

When I  worked with Jean-Michel Marin at Institut Henri Poincaré the week before Xmas, there was this framed picture standing on the ground, possibly in preparation for exhibition in the Institute. I found this superposition of the lady cleaning the blackboard from its maths formulas and of the seemingly unaware mathematician both compelling visually in the sheer geometric aesthetics of the act and somewhat appalling in its message. Especially when considering the initiatives taken by IHP towards reducing the gender gap in maths. After inquiring into the issue, I found that this picture was part of a whole photograph exhibit on IHP by Vincent Moncorgé, now published into a book, La Maison des Mathématiques by Villani, Uzan, and Moncorgé. Most pictures are on-line and I found them quite appealing. Except again for the above.

Le Monde puzzle [#990]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , on January 12, 2017 by xi'an

To celebrate the new year (assuming it is worth celebrating!), Le Monde mathematical puzzle came up with the following:

Two sequences (x¹,x²,…) and (y¹,y²,…) are defined as follows: the current value of x is either the previous value or twice the previous value, while the current value of y is the sum of the values of x up to now. What is the minimum number of steps to reach 2016 or 2017?

By considering that all consecutive powers of 2 must appear at least one, the puzzles boils down to finding the minimal number of replications in the remainder of the year minus the sum of all powers of 2. Which itself boils down to deriving the binary decomposition of that remainder. Hence the basic R code (using intToBits):

deco=function(k=2016){
m=trunc(log2(k))
while (sum(2^(0:m))>k) m=m-1
if (sum(2^(0:m))==k){ return(rep(1,m+1))
}else{
res=k-sum(2^(0:m))
return(rep(1,m+1)+as.integer(intToBits(res))[1:(m+1)])


which produces

> sum(deco(2016))
[1] 16
> sum(deco(2017))
[1] 16
> sum(deco(1789))
[1] 18


anytime algorithm

Posted in Books, Statistics with tags , , , , , , , , , on January 11, 2017 by xi'an

Lawrence Murray, Sumeet Singh, Pierre Jacob, and Anthony Lee (Warwick) recently arXived a paper on Anytime Monte Carlo. (The earlier post on this topic is no coincidence, as Lawrence had told me about this problem when he visited Paris last Spring. Including a forced extension when his passport got stolen.) The difficulty with anytime algorithms for MCMC is the lack of exchangeability of the MCMC sequence (except for formal settings where regeneration can be used).

When accounting for duration of computation between steps of an MCMC generation, the Markov chain turns into a Markov jump process, whose stationary distribution α is biased by the average delivery time. Unless it is constant. The authors manage this difficulty by interlocking the original chain with a secondary chain so that even- and odd-index chains are independent. The secondary chain is then discarded. This provides a way to run an anytime MCMC. The principle can be extended to K+1 chains, run one after the other, since only one of those chains need be discarded. It also applies to SMC and SMC². The appeal of anytime simulation in this particle setting is that resampling is no longer a bottleneck. Hence easily distributed among processors. One aspect I do not fully understand is how the computing budget is handled, since allocating the same real time to each iteration of SMC seems to envision each target in the sequence as requiring the same amount of time. (An interesting side remark made in this paper is the lack of exchangeability resulting from elaborate resampling mechanisms, lack I had not thought of before.)

learning and inference for medical discovery in Oxford [postdoc]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , on January 10, 2017 by xi'an

[Here is a call for a two-year postdoc in Oxford sent to me by Arnaud Doucet. For those worried about moving to Britain, I think that, given the current pace—or lack thereof—of the negotiations with the EU, it is very likely that Britain will not have Brexited two years from now.]

Numerous medical problems ranging from screening to diagnosis to treatment of chronic diseases to  management of care in hospitals requires the development of novel statistical models and methods. These models and methods need to address the unique characteristics of medical data such as sampling bias, heterogeneity, non-stationarity, informative censoring etc. Existing state-of-the-art machine learning and statistics techniques often fail to exploit those characteristics. Additionally, the focus needs to be on probabilistic models which are
interpretable by the clinicians so that the inference results can be integrated within the medical-decision making.

We have access to unique datasets for clinical deterioration of patients in the hospital, for cancer screening, and for treatment of chronic diseases. Preliminary work has been tested and implemented at UCLA Medical Center, resulting in significantly management care in this hospital.

The successful applicant will be expected to develop new probabilistic models and learning methods inspired by these applications. The focus will be primarily on methodological and theoretical developments, and involve collaborating with Oxford researchers in machine learning, computational statistics and medicine to bring these developments to practice.

The post-doctoral researcher will be jointly supervised by Prof. Mihaela van der Schaar and Prof. Arnaud Doucet. Both of them have a strong track-record in advising PhD students and post-doctoral researchers who subsequently became successful academics in statistics, engineering sciences, computer science and economics. The position is for 2 years.

empirical Bayes, reference priors, entropy & EM

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , on January 9, 2017 by xi'an

Klebanov and co-authors from Berlin arXived this paper a few weeks ago and it took me a quiet evening in Darjeeling to read it. It starts with the premises that led Robbins to introduce empirical Bayes in 1956 (although the paper does not appear in the references), where repeated experiments with different parameters are run. Except that it turns non-parametric in estimating the prior. And to avoid resorting to the non-parametric MLE, which is the empirical distribution, it adds a smoothness penalty function to the picture. (Warning: I am not a big fan of non-parametric MLE!) The idea seems to have been Good’s, who acknowledged using the entropy as penalty is missing in terms of reparameterisation invariance. Hence the authors suggest instead to use as penalty function on the prior a joint relative entropy on both the parameter and the prior, which amounts to the average of the Kullback-Leibler divergence between the sampling distribution and the predictive based on the prior. Which is then independent of the parameterisation. And of the dominating measure. This is the only tangible connection with reference priors found in the paper.

The authors then introduce a non-parametric EM algorithm, where the unknown prior becomes the “parameter” and the M step means optimising an entropy in terms of this prior. With an infinite amount of data, the true prior (meaning the overall distribution of the genuine parameters in this repeated experiment framework) is a fixed point of the algorithm. However, it seems that the only way it can be implemented is via discretisation of the parameter space, which opens a whole Pandora box of issues, from discretisation size to dimensionality problems. And to motivating the approach by regularisation arguments, since the final product remains an atomic distribution.

While the alternative of estimating the marginal density of the data by kernels and then aiming at the closest entropy prior is discussed, I find it surprising that the paper does not consider the rather natural of setting a prior on the prior, e.g. via Dirichlet processes.