BayesComp’20

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

First, I really have to congratulate my friend Jim Hobert for a great organisation of the meeting adopting my favourite minimalist principles (no name tag, no “goodies” apart from the conference schedule, no official talks). Without any pretense at objectivity, I also appreciated very much the range of topics and the sweet frustration of having to choose between two or three sessions each time. Here are some notes taken during some talks (with no implicit implication for the talks no mentioned, re. above frustration! as well as very short nights making sudden lapse in concentration highly likely).

On Day 1, Paul Fearnhead’s inaugural plenary talk was on continuous time Monte Carlo methods, mostly bouncy particle and zig-zag samplers, with a detailed explanation on the simulation of the switching times which likely brought the audience up to speed even if they had never heard of them. And an opening on PDMPs used as equivalents to reversible jump MCMC, reminding me of the continuous time (point process) solutions of Matthew Stephens for mixture inference (and of Preston, Ripley, Møller).

The same morn I heard of highly efficient techniques to handle very large matrices and p>n variables selections by Akihiko Nishimura and Ruth Baker on a delayed acceptance ABC, using a cheap proxy model. Somewhat different from indirect inference. I found the reliance on ESS somewhat puzzling given the intractability of the likelihood (and the low reliability of the frequency estimate) and the lack of connection with the “real” posterior. At the same ABC session, Umberto Picchini spoke on a joint work with Richard Everitt (Warwick) on linking ABC and pseudo-marginal MCMC by bootstrap. Actually, the notion of ABC likelihood was already proposed as pseudo-marginal ABC by Anthony Lee, Christophe Andrieu and Arnaud Doucet in the discussion of Fearnhead and Prangle (2012) but I wonder at the focus of being unbiased when the quantity is not the truth, i.e. the “real” likelihood. It would seem more appropriate to attempt better kernel estimates on the distribution of the summary itself. The same session also involved David Frazier who linked our work on ABC for misspecified models and an on-going investigation of synthetic likelihood.

Later, there was a surprise occurrence of the Bernoulli factory in a talk by Radu Herbei on Gaussian process priors with accept-reject algorithms, leading to exact MCMC, although the computing implementation remains uncertain. And several discussions during the poster session, incl. one on the planning of a 2021 workshop in Oaxaca centred on objective Bayes advances as we received acceptance of our proposal by BIRS today!

On Day 2, David Blei gave a plenary introduction to variational Bayes inference and latent Dirichlet allocations, somewhat too introductory for my taste although other participants enjoyed this exposition. He also mentioned a recent JASA paper on the frequentist consistency of variational Bayes that I should check. Speaking later with PhD students, they really enjoyed this opening on an area they did not know that well.

A talk by Kengo Kamatani (whom I visited last summer) on improved ergodicity rates for heavy tailed targets and Crank-NIcholson modifications to the random walk proposal (which uses an AR(1) representation instead of the random walk). With the clever idea of adding the scale of the proposal as an extra parameter with a prior of its own. Gaining one order of magnitude in the convergence speed (i.e. from d to 1 and from d² to d, where d is the dimension), which is quite impressive (and just published in JAP).Veronica Rockova linked Bayesian variable selection and machine learning via ABC, with conditions on the prior for model consistency. And a novel approach using part of the data to learn an ABC partial posterior, which reminded me of the partial  Bayes factors of the 1990’s although it is presumably unrelated. And a replacement of the original rejection ABC via multi-armed bandits, where each variable is represented by an arm, called ABC Bayesian forests. Recalling the simulation trick behind Thompson’s approach, reproduced for the inclusion or exclusion of variates and producing a fixed estimate for the (marginal) inclusion probabilities, which makes it sound like a prior-feeback form of empirical Bayes. Followed by a talk of Gregor Kastner on MCMC handling of large time series with specific priors and a massive number of parameters.

The afternoon also had a wealth of exciting talks and missed opportunities (in the other sessions!). Which ended up with a strong if unintended French bias since I listened to Christophe Andrieu, Gabriel Stolz, Umut Simsekli, and Manon Michel on different continuous time processes, with Umut linking GANs, multidimensional optimal transport, sliced-Wasserstein, generative models, and new stochastic differential equations. Manon Michel gave a highly intuitive talk on creating non-reversibility, getting rid of refreshment rates in PDMPs to kill any form of reversibility.

computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , on November 14, 2018 by xi'an

On Day 2, Carsten Hartmann used a representation of the log cumulant as solution to a minimisation problem over a collection of importance functions (by the Vonsker-Varadhan principle), with links to X entropy and optimal control, a theme also considered by Alain Dunmus when considering the uncorrected discretised Langevin diffusion with a decreasing sequence of discretisation scale factors (Jordan, Kinderlehrer and Otto) in the spirit of convex regularisation à la Rockafellar. Also representing ULA as an inexact gradient descent algorithm. Murray Pollock (Warwick) presented a new technique called fusion to simulate from products of d densities, as in scalable MCMC (but not only). With an (early) starting and startling remark that when simulating one realisation from each density in the product and waiting for all of them to be equal means simulating from the product, in a strong link to the (A)BC fundamentals. This is of course impractical and Murray proposes to follow d Brownian bridges all ending up in the average of these simulations, constructing an acceptance probability that is computable and validating the output.

The second “hand-on” lecture was given by Gareth Roberts (Warwick) on the many aspects of scaling MCMC algorithms, which started with the famous 0.234 acceptance rate paper in 1996. While I was aware of some of these results (!), the overall picture was impressive, including a notion of complexity I had not seen before. And a last section on PDMPs where Gareth presented very recent on the different scales of convergence of Zigzag and bouncy particle samplers, mostly to the advantage of Zigzag.In the afternoon, Jeremy Heng presented a continuous time version of simulated tempering by adding a drift to the Langevin diffusion with time-varying energy, which must be solution to the Liouville pde $\text{div} \pi_t f = \partial_t \pi_t$. Which connects to a flow transport problem when solving the pde under additional conditions. Unclear to me was the creation of the infinite sequence. This talk was very much at the interface in the spirit of the workshop! (Maybe surprisingly complex when considering the endpoint goal of simulating from a given target.) Jonathan Weare’s talk was about quantum chemistry which translated into finding eigenvalues of an operator. Turning in to a change of basis in a inhumanly large space (10¹⁸⁰ dimensions!). Matt Moore presented the work on Raman spectroscopy he did while a postdoc at Warwick, with an SMC based classification of the peaks of a spectrum (to be used on Mars?) and Alessandra Iacobucci (Dauphine) showed us the unexpected thermal features exhibited by simulations of chains of rotors subjected to both thermal and mechanical forcings, which we never discussed in Dauphine beyond joking on her many batch jobs running on our cluster!

And I remembered today that there is currently and in parallel another BIRS workshop on statistical model selection [and a lot of overlap with our themes] taking place in Banff! With snow already there! Unfair or rather #unfair, as someone much too well-known would whine..! Not that I am in a position to complain about the great conditions here in Oaxaca (except for having to truly worry about stray dogs rather than conceptually about bears makes running more of a challenge, if not the altitude since both places are about the same).

MCqMC 2018, Rennes [slides]

Posted in Statistics with tags , , , , , on July 3, 2018 by xi'an

Here are my slides for the talk I give this morning at MCqMC 20188. Based on slides first written by Changye Wu and on our joint papers. As it happens, I was under the impression I would give a survey on partially deterministic Markov processes. But, as it goes (!), my talk takes place after a superb plenary talk by Christophe Andrieu on non-reversibility, where he gave motivations for recoursing to non-reversibility and general results for variance reduction, plus a whole session on the topic by Jorens Bierkens, Alex Thiéry, Alain Durmus, and Arnak Dalalyan (CREST), which covered the topics in the following slides, only better! Reducing the informative contents of my talk to the alternative to the Zig-Zag sampler Changye proposed, which makes the talk of limited appeal, I am afraid. (There are four other sessions at the same time, fortunately!)

Bouncing bouncy particle papers

Posted in Books, pictures, Statistics, University life with tags , , , , on July 27, 2017 by xi'an

Yesterday, two papers on bouncy particle samplers simultaneously appeared on arXiv, arxiv:1707.05200 by Chris Sherlock and Alex Thiery, and arxiv:1707.05296 by Paul Vanetti, Alexandre Bouchard-Côté, George Deligiannidis, and Arnaud Doucet. As a coordinated move by both groups of authors who had met the weeks before at the Isaac Newton Institute in Cambridge.

The paper by Sherlock and Thiery, entitled a discrete bouncy particle sampler, considers a delayed rejection approach that only requires point-wise evaluations of the target density. The delay being into making a speed flip move after a proposal involving a flip in the speed and a drift in the variable of interest is rejected. To achieve guaranteed ergodicity, they add a random perturbation as in our recent paper, plus another perturbation based on a Brownian argument. Given that this is a discretised version of the continuous-time bouncy particle sampler, the discretisation step δ need be calibrated. The authors follow a rather circumvoluted argument to argue in favour of seeking a maximum number of reflections (for which I have obviously no intuition). Overall, I find it hard to assess how much of an advance this is, even when simulations support the notion of a geometric convergence.

“Our results provide a cautionary example that in certain high-dimensional scenarios, it is still preferable to perform refreshment even when randomized bounces are used.” Vanetti et al.

The paper by Paul Vanetti and co-authors has a much more ambitious scale in that it unifies most of the work done so far in this area and relates piecewise deterministic processes, Hamiltonian Monte Carlo, and discrete versions, containing on top fine convergence results. The main idea is to improve upon the existing deterministic methods by taking (more) into account the target density. Hence the use of a bouncy particle sampler associated with the Hamiltonian (as in HMC). This borrows from an earlier slice sampler idea of Iain Murray, Ryan Adams, and David McKay (AISTATS 2010), exploiting an exact Hamiltonian dynamics for an approximation to the true target to explore its support. Except that bouncing somewhat avoids the slice step. The [eight] discrete bouncy particle particle samplers derived from this framework are both correct against the targeted distribution and do not require the simulation of event times. The paper distinguishes between global and local versions, the later exploiting conditional independence properties in the (augmented) target. Which sounds like a version of multiple slice sampling.

forward event-chain Monte Carlo

Posted in Statistics, Travel, University life with tags on July 24, 2017 by xi'an

One of the authors of this paper contacted me to point out their results arXived last February [and revised last month] as being related to our bouncy particle paper arXived two weeks ago. And to an earlier paper by Michel et al. (2014) published in the Journal of Chemical Physics. (The authors actually happen to work quite nearby, on a suburban road I take every time I bike to Dauphine!) I think one reason we missed this paper in our literature survey is the use of a vocabulary taken from Physics rather than our Monte Carlo community, as in, e.g., using “event chain” instead of “bouncy particle”… The paper indeed contains schemes similar to ours, as did the on-going work by Chris Sherlock and co-authors Chris presented last week at the Isaac Newton Institute workshop on scalability. (Although I had troubles reading its physics style, in particular the justification for stationarity or “global balance” and the use of “infinitesimals”.)

“…we would like to find the optimal set of directions {e} necessary for the ergodicity and  allowing for an efficient exploration of the target distribution.”

The improvement sought is about improving the choice of the chain direction at each direction change. In order to avoid the random walk behaviour. The proposal is to favour directions close to the gradient of the log-likelihood, keeping the orthogonal to this gradient constant in direction (as in our paper) if not in scale. (As indicated above I have trouble understanding the ergodicity proof, if not the irreducibility. I also do not see how solving (11), which should be (12), is feasible in general. And why (29) amounts to simulating from (27)…)