computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , on November 14, 2018 by xi'an

On Day 2, Carsten Hartmann used a representation of the log cumulant as solution to a minimisation problem over a collection of importance functions (by the Vonsker-Varadhan principle), with links to X entropy and optimal control, a theme also considered by Alain Dunmus when considering the uncorrected discretised Langevin diffusion with a decreasing sequence of discretisation scale factors (Jordan, Kinderlehrer and Otto) in the spirit of convex regularisation à la Rockafellar. Also representing ULA as an inexact gradient descent algorithm. Murray Pollock (Warwick) presented a new technique called fusion to simulate from products of d densities, as in scalable MCMC (but not only). With an (early) starting and startling remark that when simulating one realisation from each density in the product and waiting for all of them to be equal means simulating from the product, in a strong link to the (A)BC fundamentals. This is of course impractical and Murray proposes to follow d Brownian bridges all ending up in the average of these simulations, constructing an acceptance probability that is computable and validating the output.

The second “hand-on” lecture was given by Gareth Roberts (Warwick) on the many aspects of scaling MCMC algorithms, which started with the famous 0.234 acceptance rate paper in 1996. While I was aware of some of these results (!), the overall picture was impressive, including a notion of complexity I had not seen before. And a last section on PDMPs where Gareth presented very recent on the different scales of convergence of Zigzag and bouncy particle samplers, mostly to the advantage of Zigzag.In the afternoon, Jeremy Heng presented a continuous time version of simulated tempering by adding a drift to the Langevin diffusion with time-varying energy, which must be solution to the Liouville pde $\text{div} \pi_t f = \partial_t \pi_t$. Which connects to a flow transport problem when solving the pde under additional conditions. Unclear to me was the creation of the infinite sequence. This talk was very much at the interface in the spirit of the workshop! (Maybe surprisingly complex when considering the endpoint goal of simulating from a given target.) Jonathan Weare’s talk was about quantum chemistry which translated into finding eigenvalues of an operator. Turning in to a change of basis in a inhumanly large space (10¹⁸⁰ dimensions!). Matt Moore presented the work on Raman spectroscopy he did while a postdoc at Warwick, with an SMC based classification of the peaks of a spectrum (to be used on Mars?) and Alessandra Iacobucci (Dauphine) showed us the unexpected thermal features exhibited by simulations of chains of rotors subjected to both thermal and mechanical forcings, which we never discussed in Dauphine beyond joking on her many batch jobs running on our cluster!

And I remembered today that there is currently and in parallel another BIRS workshop on statistical model selection [and a lot of overlap with our themes] taking place in Banff! With snow already there! Unfair or rather #unfair, as someone much too well-known would whine..! Not that I am in a position to complain about the great conditions here in Oaxaca (except for having to truly worry about stray dogs rather than conceptually about bears makes running more of a challenge, if not the altitude since both places are about the same).

MCqMC 2018, Rennes [slides]

Posted in Statistics with tags , , , , , on July 3, 2018 by xi'an

Here are my slides for the talk I give this morning at MCqMC 20188. Based on slides first written by Changye Wu and on our joint papers. As it happens, I was under the impression I would give a survey on partially deterministic Markov processes. But, as it goes (!), my talk takes place after a superb plenary talk by Christophe Andrieu on non-reversibility, where he gave motivations for recoursing to non-reversibility and general results for variance reduction, plus a whole session on the topic by Jorens Bierkens, Alex Thiéry, Alain Durmus, and Arnak Dalalyan (CREST), which covered the topics in the following slides, only better! Reducing the informative contents of my talk to the alternative to the Zig-Zag sampler Changye proposed, which makes the talk of limited appeal, I am afraid. (There are four other sessions at the same time, fortunately!)

Bouncing bouncy particle papers

Posted in Books, pictures, Statistics, University life with tags , , , , on July 27, 2017 by xi'an

Yesterday, two papers on bouncy particle samplers simultaneously appeared on arXiv, arxiv:1707.05200 by Chris Sherlock and Alex Thiery, and arxiv:1707.05296 by Paul Vanetti, Alexandre Bouchard-Côté, George Deligiannidis, and Arnaud Doucet. As a coordinated move by both groups of authors who had met the weeks before at the Isaac Newton Institute in Cambridge.

The paper by Sherlock and Thiery, entitled a discrete bouncy particle sampler, considers a delayed rejection approach that only requires point-wise evaluations of the target density. The delay being into making a speed flip move after a proposal involving a flip in the speed and a drift in the variable of interest is rejected. To achieve guaranteed ergodicity, they add a random perturbation as in our recent paper, plus another perturbation based on a Brownian argument. Given that this is a discretised version of the continuous-time bouncy particle sampler, the discretisation step δ need be calibrated. The authors follow a rather circumvoluted argument to argue in favour of seeking a maximum number of reflections (for which I have obviously no intuition). Overall, I find it hard to assess how much of an advance this is, even when simulations support the notion of a geometric convergence.

“Our results provide a cautionary example that in certain high-dimensional scenarios, it is still preferable to perform refreshment even when randomized bounces are used.” Vanetti et al.

The paper by Paul Vanetti and co-authors has a much more ambitious scale in that it unifies most of the work done so far in this area and relates piecewise deterministic processes, Hamiltonian Monte Carlo, and discrete versions, containing on top fine convergence results. The main idea is to improve upon the existing deterministic methods by taking (more) into account the target density. Hence the use of a bouncy particle sampler associated with the Hamiltonian (as in HMC). This borrows from an earlier slice sampler idea of Iain Murray, Ryan Adams, and David McKay (AISTATS 2010), exploiting an exact Hamiltonian dynamics for an approximation to the true target to explore its support. Except that bouncing somewhat avoids the slice step. The [eight] discrete bouncy particle particle samplers derived from this framework are both correct against the targeted distribution and do not require the simulation of event times. The paper distinguishes between global and local versions, the later exploiting conditional independence properties in the (augmented) target. Which sounds like a version of multiple slice sampling.

forward event-chain Monte Carlo

Posted in Statistics, Travel, University life with tags on July 24, 2017 by xi'an

One of the authors of this paper contacted me to point out their results arXived last February [and revised last month] as being related to our bouncy particle paper arXived two weeks ago. And to an earlier paper by Michel et al. (2014) published in the Journal of Chemical Physics. (The authors actually happen to work quite nearby, on a suburban road I take every time I bike to Dauphine!) I think one reason we missed this paper in our literature survey is the use of a vocabulary taken from Physics rather than our Monte Carlo community, as in, e.g., using “event chain” instead of “bouncy particle”… The paper indeed contains schemes similar to ours, as did the on-going work by Chris Sherlock and co-authors Chris presented last week at the Isaac Newton Institute workshop on scalability. (Although I had troubles reading its physics style, in particular the justification for stationarity or “global balance” and the use of “infinitesimals”.)

“…we would like to find the optimal set of directions {e} necessary for the ergodicity and  allowing for an efficient exploration of the target distribution.”

The improvement sought is about improving the choice of the chain direction at each direction change. In order to avoid the random walk behaviour. The proposal is to favour directions close to the gradient of the log-likelihood, keeping the orthogonal to this gradient constant in direction (as in our paper) if not in scale. (As indicated above I have trouble understanding the ergodicity proof, if not the irreducibility. I also do not see how solving (11), which should be (12), is feasible in general. And why (29) amounts to simulating from (27)…)

generalised bouncy particle sampler

Posted in Statistics with tags on June 19, 2017 by xi'an

My PhD student, Wu Changye, just completed a paper on an extension of the bouncy particle sampler, to which he associated me as we had discussed the contents and especially the proofs together, although I did not participate in the redaction of the paper. The bouncy particle sampler belongs to the collection of piecewise deterministic samplers, which also contains the recent zig-zag sampler of Joris Bierkens, Paul Fearnhead, and Gareth Roberts (Warwick).  It was introduced by Alexandre Bouchard-Côté, Sebastian Vollmer (Warwick) and Arnaud Doucet, to appear in JASA, and uses one particle that is characterised by the pair (position, velocity), the velocity only changing at random times determined by the target density. The change is also deterministic. Which may lead to a lack of irreducibility in the process and is solved by an extra Poisson process in the original paper. In the generalisation imagined by Changye, the bounce becomes partly random in the direction orthogonal to the gradient. The stationarity results of Bouchard-Côté, Vollmer and Doucet then extend to this setting. As does the ability to subsample and compute faster versions of the likelihood function.