BayesComp’20

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

First, I really have to congratulate my friend Jim Hobert for a great organisation of the meeting adopting my favourite minimalist principles (no name tag, no “goodies” apart from the conference schedule, no official talks). Without any pretense at objectivity, I also appreciated very much the range of topics and the sweet frustration of having to choose between two or three sessions each time. Here are some notes taken during some talks (with no implicit implication for the talks no mentioned, re. above frustration! as well as very short nights making sudden lapse in concentration highly likely).

On Day 1, Paul Fearnhead’s inaugural plenary talk was on continuous time Monte Carlo methods, mostly bouncy particle and zig-zag samplers, with a detailed explanation on the simulation of the switching times which likely brought the audience up to speed even if they had never heard of them. And an opening on PDMPs used as equivalents to reversible jump MCMC, reminding me of the continuous time (point process) solutions of Matthew Stephens for mixture inference (and of Preston, Ripley, Møller).

The same morn I heard of highly efficient techniques to handle very large matrices and p>n variables selections by Akihiko Nishimura and Ruth Baker on a delayed acceptance ABC, using a cheap proxy model. Somewhat different from indirect inference. I found the reliance on ESS somewhat puzzling given the intractability of the likelihood (and the low reliability of the frequency estimate) and the lack of connection with the “real” posterior. At the same ABC session, Umberto Picchini spoke on a joint work with Richard Everitt (Warwick) on linking ABC and pseudo-marginal MCMC by bootstrap. Actually, the notion of ABC likelihood was already proposed as pseudo-marginal ABC by Anthony Lee, Christophe Andrieu and Arnaud Doucet in the discussion of Fearnhead and Prangle (2012) but I wonder at the focus of being unbiased when the quantity is not the truth, i.e. the “real” likelihood. It would seem more appropriate to attempt better kernel estimates on the distribution of the summary itself. The same session also involved David Frazier who linked our work on ABC for misspecified models and an on-going investigation of synthetic likelihood.

Later, there was a surprise occurrence of the Bernoulli factory in a talk by Radu Herbei on Gaussian process priors with accept-reject algorithms, leading to exact MCMC, although the computing implementation remains uncertain. And several discussions during the poster session, incl. one on the planning of a 2021 workshop in Oaxaca centred on objective Bayes advances as we received acceptance of our proposal by BIRS today!

On Day 2, David Blei gave a plenary introduction to variational Bayes inference and latent Dirichlet allocations, somewhat too introductory for my taste although other participants enjoyed this exposition. He also mentioned a recent JASA paper on the frequentist consistency of variational Bayes that I should check. Speaking later with PhD students, they really enjoyed this opening on an area they did not know that well.

A talk by Kengo Kamatani (whom I visited last summer) on improved ergodicity rates for heavy tailed targets and Crank-NIcholson modifications to the random walk proposal (which uses an AR(1) representation instead of the random walk). With the clever idea of adding the scale of the proposal as an extra parameter with a prior of its own. Gaining one order of magnitude in the convergence speed (i.e. from d to 1 and from d² to d, where d is the dimension), which is quite impressive (and just published in JAP).Veronica Rockova linked Bayesian variable selection and machine learning via ABC, with conditions on the prior for model consistency. And a novel approach using part of the data to learn an ABC partial posterior, which reminded me of the partial  Bayes factors of the 1990’s although it is presumably unrelated. And a replacement of the original rejection ABC via multi-armed bandits, where each variable is represented by an arm, called ABC Bayesian forests. Recalling the simulation trick behind Thompson’s approach, reproduced for the inclusion or exclusion of variates and producing a fixed estimate for the (marginal) inclusion probabilities, which makes it sound like a prior-feeback form of empirical Bayes. Followed by a talk of Gregor Kastner on MCMC handling of large time series with specific priors and a massive number of parameters.

The afternoon also had a wealth of exciting talks and missed opportunities (in the other sessions!). Which ended up with a strong if unintended French bias since I listened to Christophe Andrieu, Gabriel Stolz, Umut Simsekli, and Manon Michel on different continuous time processes, with Umut linking GANs, multidimensional optimal transport, sliced-Wasserstein, generative models, and new stochastic differential equations. Manon Michel gave a highly intuitive talk on creating non-reversibility, getting rid of refreshment rates in PDMPs to kill any form of reversibility.

riddles on Egyptian fractions and Bernoulli factories

Posted in Books, Kids, R with tags , , , , , , , , , , , , , on June 11, 2019 by xi'an

Two fairy different riddles on the weekend Riddler. The first one is (in fine) about Egyptian fractions: I understand the first one as

Find the Egyptian fraction decomposition of 2 into 11 distinct unit fractions that maximises the smallest fraction.

And which I cannot solve despite perusing this amazing webpage on Egyptian fractions and making some attempts at brute force  random exploration. Using Fibonacci’s greedy algorithm. I managed to find such decompositions

2 = 1 +1/2 +1/6 +1/12 +1/16 +1/20 +1/24 +1/30 +1/42 +1/48 +1/56

after seeing in this short note

2 = 1 +1/3 +1/5 +1/7 +1/9 +1/42 +1/15 +1/18 +1/30 +1/45 +1/90

And then Robin came with the following:

2 = 1 +1/4 +1/5 +1/8 +1/10 +1/12 +1/15 +1/20 +1/21 +1/24 +1/28

which may prove to be the winner! But there is even better:

2 = 1 +1/5 +1/6 +1/8 +1/9 +1/10 +1/12 +1/15 +1/18 +1/20 +1/24

The second riddle is a more straightforward Bernoulli factory problem:

Given a coin with a free-to-choose probability p of head, design an experiment with a fixed number k of draws that returns three outcomes with equal probabilities.

For which I tried a brute-force search of all possible 3-partitions of the 2-to-the-k events for a range of values of p from .01 to .5 and for k equal to 3,4,… But never getting an exact balance between the three groups. Reading later the solution on the Riddler, I saw that there was an exact solution for 4 draws when

$p=\frac{3-\sqrt{3(4\sqrt{9}-6)}}{6}$

Augmenting the precision of my solver (by multiplying all terms by 100), I indeed found a difference of

> solver((3-sqrt(3*(4*sqrt(6)-9)))/6,ba=1e5)[1]
[1] 8.940697e-08

which means an error of 9 x 100⁻⁴ x 10⁻⁸, ie roughly 10⁻¹⁵.

Bernoulli race particle filters

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 27, 2019 by xi'an

Sebastian Schmon, Arnaud Doucet and George Deligiannidis have recently arXived an AISTATS paper with the above nice title. The motivation for the extension is facing intractable particle weights for state space models, as for instance in discretised diffusions.  In most cases, actually, the weight associated with the optimal forward proposal involves an intractable integral which is the predictive of the current observed variate given the past hidden states. And in some cases, there exist unbiased and non-negative estimators of the targets,  which can thus be substituted, volens nolens,  to the original filter. As in many pseudo-marginal derivations, this new algorithm can be interpreted as targeting an augmented distribution that involves the auxiliary random variates behind the unbiased estimators of the particle weights. A worthwhile remark since it allows for the preservation of the original target as in (8) provided the auxiliary random variates are simulated from the right conditionals. (At least ideally as I have no clue when this is feasible.)

“if Bernoulli resampling is per-formed, the variance for any Monte Carlo estimate will be the same as if the true weights were known and one applies standard multinomial resampling.”

The Bernoulli race in the title stands for a version of the Bernoulli factory problem, where an intractable and bounded component of the weight can be turned into a probability, for which a Bernoulli draw is available, hence providing a Multinomial sampling with the intractable weights since replacing the exact probability with an estimate does not modify the Bernoulli distribution, amazingly so! Even with intractable normalising constants in particle filters. The practicality of the approach may however be restricted by the possibility of some intractable terms being very small and requiring many rejections for one acceptance, as the number of attempts is a compound geometric. The intractability may add to the time request the drawback of keeping this feature hidden as well. Or force some premature interruption in the settings of a parallel implementation.

Barker at the Bernoulli factory

Posted in Books, Statistics with tags , , , , , , , on October 5, 2017 by xi'an

Yesterday, Flavio Gonçalves, Krzysztof Latuszýnski, and Gareth Roberts (Warwick) arXived a paper on Barker’s algorithm for Bayesian inference with intractable likelihoods.

“…roughly speaking Barker’s method is at worst half as good as Metropolis-Hastings.”

Barker’s acceptance probability (1965) is a smooth if less efficient version of Metropolis-Hastings. (Barker wrote his thesis in Adelaide, in the Mathematical Physics department. Most likely, he never interacted with Ronald Fisher, who died there in 1962) This smoothness is exploited by devising a Bernoulli factory consisting in a 2-coin algorithm that manages to simulate the Bernoulli variable associated with the Barker probability, from a coin that can simulate Bernoulli’s with probabilities proportional to [bounded] π(θ). For instance, using a bounded unbiased estimator of the target. And another coin that simulates another Bernoulli on a remainder term. Assuming the bound on the estimate of π(θ) is known [or part of the remainder term]. This is a neat result in that it expands the range of pseudo-marginal methods (and resuscitates Barker’s formula from oblivion!). The paper includes an illustration in the case of the far-from-toyish Wright-Fisher diffusion. [Making Fisher and Barker meeting, in the end!]

optimal Bernoulli factory

Posted in Statistics with tags , , , , , , , , , , on January 17, 2017 by xi'an

One of the last arXivals of the year was this paper by Luis Mendo on an optimal algorithm for Bernoulli factory (or Lovàsz‘s or yet Basu‘s) problems, i.e., for producing an unbiased estimate of f(p), 0<p<1, from an unrestricted number of Bernoulli trials with probability p of heads. (See, e.g., Mark Huber’s recent book for background.) This paper drove me to read an older 1999 unpublished document by Wästlund, unpublished because of the overlap with Keane and O’Brien (1994). One interesting gem in this document is that Wästlund produces a Bernoulli factory for the function f(p)=√p, which is not of considerable interest per se, but which was proposed to me as a puzzle by Professor Sinha during my visit to the Department of Statistics at the University of Calcutta. Based on his 1979 paper with P.K. Banerjee. The algorithm is based on a stopping rule N: throw a fair coin until the number of heads n+1 is greater than the number of tails n. The event N=2n+1 occurs with probability

${2n \choose n} \big/ 2^{2n+1}$

[Using a biased coin with probability p to simulate a fair coin is straightforward.] Then flip the original coin n+1 times and produce a result of 1 if at least one toss gives heads. This happens with probability √p.

Mendo generalises Wästlund‘s algorithm to functions expressed as a power series in (1-p)

$f(p)=1-\sum_{i=1}^\infty c_i(1-p)^i$

with the sum of the weights being equal to one. This means proceeding through Bernoulli B(p) generations until one realisation is one or a probability

$c_i\big/1-\sum_{j=1}^{i-1}c_j$

event occurs [which can be derived from a Bernoulli B(p) sequence]. Furthermore, this version achieves asymptotic optimality in the number of tosses, thanks to a form of Cramer-Rao lower bound. (Which makes yet another connection with Kolkata!)

exam question

Posted in Kids, Statistics, University life with tags , , , , , , , , , on June 24, 2016 by xi'an

A question for my third year statistics exam that I borrowed from Cross Validated: no student even attempted to solve this question…!

And another one borrowed from the highly popular post on the random variable [almost] always smaller than its mean!

a Bernoulli factory of sorts?

Posted in Books, Kids, Statistics with tags , , , , , on May 10, 2016 by xi'an

A nice question was posted on X validated as to figure out a way to simulate a Bernoulli B(q) variate when using only a Bernoulli B(p) generator. With the additional question of handling the special case q=a/b, a rational probability. This is not exactly a Bernoulli factory problem in that q does not write as f(p), but still a neat challenge. My solution would have been similar to the one posted by William Huber, namely to simulate a sequence of B(p) or B(1-p) towards zooming on q until the simulation of the underlying uniforms U allows us to conclude at the position of U wrt q. For instance, if p>q and X~B(p) is equal to zero, the underlying uniform is more than p, hence more than q, leading to returning zero for the B(q) generation. Else, a second B(p) or B(1-p) generation means breaking the interval (0,p) into two parts, one of which allows for stopping the generation, and so on. The solution posted by William Huber contains an R code that could be easily improved by choosing for each interval between p and (1-p) towards the maximal probability of stopping. I still wonder at the ultimate optimal solution that would minimise the (average or median) number of calls to the Bernoulli(p) generator.