[Using a biased coin with probability *p* to simulate a fair coin is straightforward.] Then flip the original coin n+1 times and produce a result of 1 if at least one toss gives heads. This happens with probability *√p*.

Mendo generalises Wästlund‘s algorithm to functions expressed as a power series in *(1-p)*

with the sum of the weights being equal to one. This means proceeding through Bernoulli B(p) generations until one realisation is one or a probability

event occurs [which can be derived from a Bernoulli B(p) sequence]. Furthermore, this version achieves asymptotic optimality in the number of tosses, thanks to a form of Cramer-Rao lower bound. (Which makes yet another connection with Kolkata!)

Filed under: Statistics Tagged: Bernoulli factory, Cramer-Rao lower bound, Darjeeling, Debabrata Basu, Himalayas, India, Kangchenjunga, Kolkata, Lovàsz, Mark Huber, University of Calcutta ]]>

Filed under: Statistics Tagged: blog, JRSSB, Royal Statistical Society, Series B ]]>

Incredible India (or Incredible !ndia) is the slogan chosen by the Indian Ministry of Tourism to promote India. And it is indeed an incredible country, from its incredibly diverse landscapes [and not only the Himalayas!] and eco-systems, to its incredibly huge range of languages [although I found out during this trip that the differences between Urdu and Hindi are more communitarian and religious than linguistic, as they both derive from Hindustani, although the alphabets completely differ] and religions [a mixed blessing], to its incredibly rich history and culture, to its incredibly wide offer of local cuisines [as shown by the Bengali sample below, where the mustard seed fish cooked in banana leaves and the fried banana flowers are not visible!] and even wines [like Sula Vineyards, which offers a pretty nice Viognier]. Not to mention incredibly savoury teas from Darjeeling and Assam.

But India is also in-credible in that it is fairly hard to believe it can function at all and still function it does! Despite or due to a massive bureaucracy, the federal and local states do not seem to operate with much or any efficiency [or such is the impression I gathered from my few trips there]. At least at the level of doing little against extreme poverty and extreme inequalities, or against massive air and water pollution [which puts India signing the Paris COP21 agreement under a bleak light, like this sun in the haze of a Kolkata highway], or towards urban planning, from garbage collection to traffic regulations, or women’s and children’s conditions. And the current BJP government seems more intent towards encouraging Hindu nationalism and religion [despite India secular constitution] than operating a rationalisation of Indian bureaucracy and politics. Although a side effect of the sudden demonetisation of 500 and 1000 rupee notes [which means one can only withdraw 2000 rupees at once, a slight nuisance when visiting India for a few days] may induce a massive jump into a cash-free economy. In Kolkata I noticed the smallest street food stalls posting about pay-by-phone abilities. Since about everyone has a mobile phone, if phones can be used as virtual wallets, this may represent a incredible move towards that cash-free market. (But also a risk of massive fraud targeting those with no other means of payment.)

The country is thus incredible in its numerous ways of bypassing the State inaction, not all to be commended of course and with extreme consequences for the poorest fraction of the population. But far from being a dystopia, it may open a window on the future metropolis all around the World, when environmental and migration pressures will see the collapse of our welfare states.

Filed under: Kids, Mountains, pictures, Running, Travel Tagged: air pollution, Bengali food, cash-free economy, cellphone, child labour, Darjeeling, ghee, India, Kolkata, panipuri, pollution, puri, Ravi Shankar, street food, traffic ]]>

Filed under: Books, Kids, pictures, Statistics, University life Tagged: Akashic Books, book review, exhibit, IHP, Institut Henri Poincaré, la maison des mathématiques, Paris, photograph, Vincent Moncorgé ]]>

Filed under: Kids Tagged: bike path, cat, data analysis, lint, pi, principal components, taxes, xkcd ]]>

Two sequences (x¹,x²,…) and (y¹,y²,…) are defined as follows: the current value of x is either the previous value or twice the previous value, while the current value of y is the sum of the values of x up to now. What is the minimum number of steps to reach 2016 or 2017?

B*y* considering that all consecutive powers of 2 must appear at least one, the puzzles boils down to finding the minimal number of replications in the remainder of the year minus the sum of all powers of 2. Which itself boils down to deriving the binary decomposition of that remainder. Hence the basic R code (using intToBits):

deco=function(k=2016){ m=trunc(log2(k)) while (sum(2^(0:m))>k) m=m-1 if (sum(2^(0:m))==k){ return(rep(1,m+1)) }else{ res=k-sum(2^(0:m)) return(rep(1,m+1)+as.integer(intToBits(res))[1:(m+1)])

which produces

> sum(deco(2016)) [1] 16 > sum(deco(2017)) [1] 16 > sum(deco(1789)) [1] 18

Filed under: Books, Kids, pictures, Statistics, Travel, University life Tagged: binary, intToBits(), Le Monde, mathematical puzzle, R ]]>

When accounting for duration of computation between steps of an MCMC generation, the Markov chain turns into a Markov jump process, whose stationary distribution α is biased by the average delivery time. Unless it is constant. The authors manage this difficulty by interlocking the original chain with a secondary chain so that even- and odd-index chains are independent. The secondary chain is then discarded. This provides a way to run an anytime MCMC. The principle can be extended to K+1 chains, run one after the other, since only one of those chains need be discarded. It also applies to SMC and SMC². The appeal of anytime simulation in this particle setting is that resampling is no longer a bottleneck. Hence easily distributed among processors. One aspect I do not fully understand is how the computing budget is handled, since allocating the same real time to each iteration of SMC seems to envision each target in the sequence as requiring the same amount of time. (An interesting side remark made in this paper is the lack of exchangeability resulting from elaborate resampling mechanisms, lack I had not thought of before.)

Filed under: Books, Statistics Tagged: anytime algorithm, Cambridge University, computing cost, exchangeability, Harvard University, MCMC, SMC, SMC², University of Oxford, University of Warwick ]]>

Filed under: Kids, pictures, Travel Tagged: Argentina, Brazil, Cataratas del Iguazú, Iguazú Falls, jatp, lizards ]]>

Numerous medical problems ranging from screening to diagnosis to treatment of chronic diseases to management of care in hospitals requires the development of novel statistical models and methods. These models and methods need to address the unique characteristics of medical data such as sampling bias, heterogeneity, non-stationarity, informative censoring etc. Existing state-of-the-art machine learning and statistics techniques often fail to exploit those characteristics. Additionally, the focus needs to be on probabilistic models which are

interpretable by the clinicians so that the inference results can be integrated within the medical-decision making.

We have access to unique datasets for clinical deterioration of patients in the hospital, for cancer screening, and for treatment of chronic diseases. Preliminary work has been tested and implemented at UCLA Medical Center, resulting in significantly management care in this hospital.

The successful applicant will be expected to develop new probabilistic models and learning methods inspired by these applications. The focus will be primarily on methodological and theoretical developments, and involve collaborating with Oxford researchers in machine learning, computational statistics and medicine to bring these developments to practice.

The post-doctoral researcher will be jointly supervised by Prof. Mihaela van der Schaar and Prof. Arnaud Doucet. Both of them have a strong track-record in advising PhD students and post-doctoral researchers who subsequently became successful academics in statistics, engineering sciences, computer science and economics. The position is for 2 years.

Filed under: Kids, pictures, Statistics, Travel, University life Tagged: Arnaud Doucet, England, Great-Britain, medical statistics, postdoctoral position, UCLA, University of Oxford ]]>

The authors then introduce a non-parametric EM algorithm, where the unknown prior becomes the “parameter” and the M step means optimising an entropy in terms of this prior. With an infinite amount of data, the true prior (meaning the overall distribution of the genuine parameters in this repeated experiment framework) is a fixed point of the algorithm. However, it seems that the only way it can be implemented is via discretisation of the parameter space, which opens a whole Pandora box of issues, from discretisation size to dimensionality problems. And to motivating the approach by regularisation arguments, since the final product remains an atomic distribution.

While the alternative of estimating the marginal density of the data by kernels and then aiming at the closest entropy prior is discussed, I find it surprising that the paper does not consider the rather natural of setting a prior on the prior, e.g. via Dirichlet processes.

Filed under: Mountains, Statistics, Travel, University life Tagged: arXiv, Darjeeling, EM algorithm, empirical Bayes, I.J. Good, JASA, Kullback-Leibler divergence, MLE, non-parametrics, penalty, reparameterisation, Robbins-Monro algorithm ]]>