computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , on November 14, 2018 by xi'an

On Day 2, Carsten Hartmann used a representation of the log cumulant as solution to a minimisation problem over a collection of importance functions (by the Vonsker-Varadhan principle), with links to X entropy and optimal control, a theme also considered by Alain Dunmus when considering the uncorrected discretised Langevin diffusion with a decreasing sequence of discretisation scale factors (Jordan, Kinderlehrer and Otto) in the spirit of convex regularisation à la Rockafellar. Also representing ULA as an inexact gradient descent algorithm. Murray Pollock (Warwick) presented a new technique called fusion to simulate from products of d densities, as in scalable MCMC (but not only). With an (early) starting and startling remark that when simulating one realisation from each density in the product and waiting for all of them to be equal means simulating from the product, in a strong link to the (A)BC fundamentals. This is of course impractical and Murray proposes to follow d Brownian bridges all ending up in the average of these simulations, constructing an acceptance probability that is computable and validating the output.

The second “hand-on” lecture was given by Gareth Roberts (Warwick) on the many aspects of scaling MCMC algorithms, which started with the famous 0.234 acceptance rate paper in 1996. While I was aware of some of these results (!), the overall picture was impressive, including a notion of complexity I had not seen before. And a last section on PDMPs where Gareth presented very recent on the different scales of convergence of Zigzag and bouncy particle samplers, mostly to the advantage of Zigzag.In the afternoon, Jeremy Heng presented a continuous time version of simulated tempering by adding a drift to the Langevin diffusion with time-varying energy, which must be solution to the Liouville pde $\text{div} \pi_t f = \partial_t \pi_t$. Which connects to a flow transport problem when solving the pde under additional conditions. Unclear to me was the creation of the infinite sequence. This talk was very much at the interface in the spirit of the workshop! (Maybe surprisingly complex when considering the endpoint goal of simulating from a given target.) Jonathan Weare’s talk was about quantum chemistry which translated into finding eigenvalues of an operator. Turning in to a change of basis in a inhumanly large space (10¹⁸⁰ dimensions!). Matt Moore presented the work on Raman spectroscopy he did while a postdoc at Warwick, with an SMC based classification of the peaks of a spectrum (to be used on Mars?) and Alessandra Iacobucci (Dauphine) showed us the unexpected thermal features exhibited by simulations of chains of rotors subjected to both thermal and mechanical forcings, which we never discussed in Dauphine beyond joking on her many batch jobs running on our cluster!

And I remembered today that there is currently and in parallel another BIRS workshop on statistical model selection [and a lot of overlap with our themes] taking place in Banff! With snow already there! Unfair or rather #unfair, as someone much too well-known would whine..! Not that I am in a position to complain about the great conditions here in Oaxaca (except for having to truly worry about stray dogs rather than conceptually about bears makes running more of a challenge, if not the altitude since both places are about the same).

congratulations, Dr. Wu!

Posted in pictures, Statistics, University life with tags , , , , , on October 4, 2018 by xi'an

This afternoon, my (now former) PhD student Changye Wu defended his thesis on Accelerated methods for MCMC, for which the jury awarded him the title of Docteur de l’Université Paris Dauphine. Congratulations to him and best wishes for his job hunting!

coordinate sampler as a non-reversible Gibbs-like MCMC sampler

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on September 12, 2018 by xi'an

In connection with the talk I gave last July in Rennes for MCqMC 2018, I posted yesterday a preprint on arXiv of the work that my [soon to defend!] Dauphine PhD student Changye Wu and I did on an alternative PDMP. In this novel avatar of the zig-zag sampler,  a  non-reversible, continuous-time MCMC sampler, that we called the Coordinate Sampler, based on a piecewise deterministic Markov process. In addition to establishing the theoretical validity of this new sampling algorithm, we show in the same line as Deligiannidis et al.  (2018) that the Markov chain it induces exhibits geometrical ergodicity for distributions which tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. A few numerical examples (a 2D banana shaped distribution à la Haario et al., 1999, strongly correlated high-dimensional normals, a log-Gaussian Cox process) highlight that our coordinate sampler is more efficient than the zig-zag sampler, in terms of effective sample size.Actually, we had sent this paper before the summer as a NIPS [2018] submission, but it did not make it through [the 4900 submissions this year and] the final review process, being eventually rated above the acceptance bar but not that above!

ICM 2018

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on August 4, 2018 by xi'an

While I am not following the International Congress of Mathematicians which just started in Rio, and even less attending, I noticed an entry on their webpage on my friend and colleague Maria Esteban which I would have liked to repost verbatim but cannot figure how. (ICM 2018 also features a plenary lecture by Michael Jordan on gradient based optimisation [which was also Michael’s topic at ISBA 2018] and another one by Sanjeev Arora on the maths deep learning, two talks broadly related with statistics, which is presumably a première at this highly selective maths conference!)

Markov chain importance sampling

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on May 31, 2018 by xi'an

Ingmar Schuster (formerly a postdoc at Dauphine and now in Freie Universität Berlin) and Ilja Klebanov (from Berlin) have recently arXived a paper on recycling proposed values in [a rather large class of] Metropolis-Hastings and unadjusted Langevin algorithms. This means using the proposed variates of one of these algorithms as in an importance sampler, with an importance weight going from the target over the (fully conditional) proposal to the target over the marginal stationary target. In the Metropolis-Hastings case, since the later is not available in most setups, the authors suggest using a Rao-Blackwellised nonparametric estimate based on the entire MCMC chain. Or a subset.

“Our estimator refutes the folk theorem that it is hard to estimate [the normalising constant] with mainstream Monte Carlo methods such as Metropolis-Hastings.”

The paper thus brings an interesting focus on the proposed values, rather than on the original Markov chain,  which naturally brings back to mind the derivation of the joint distribution of these proposed values we made in our (1996) Rao-Blackwellisation paper with George Casella. Where we considered a parametric and non-asymptotic version of this distribution, which brings a guaranteed improvement to MCMC (Metropolis-Hastings) estimates of integrals. In subsequent papers with George, we tried to quantify this improvement and to compare different importance samplers based on some importance sampling corrections, but as far as I remember, we only got partial results along this way, and did not cover the special case of the normalising constant Þ… Normalising constants did not seem such a pressing issue at that time, I figure. (A Monte Carlo 101 question: how can we be certain the importance sampler offers a finite variance?)

I think this is interesting future work. My intuition is that for Metropolis-Hastings importance sampling with random walk proposals, the variance is guaranteed to be finite because the importance distribution ρ_θ is a convolution of your target ρ with the random walk kernel q. This guarantees that the tails of ρ_θ are no lighter than those of ρ. What other forms of q mean for the tails of ρ_θ I have less intuition about.

When considering the Langevin alternative with transition (4), I was first confused and thought it was incorrect for moving from one value of Y (proposal) to the next. But that’s what unadjusted means in “unadjusted Langevin”! As pointed out in the early Langevin literature, e.g., by Gareth Roberts and Richard Tweedie, using a discretised Langevin diffusion in an MCMC framework means there is a risk of non-stationarity & non-ergodicity. Obviously, the corrected (MALA) version is more delicate to approximate (?) but at the very least it ensures the Markov chain does not diverge. Even when the unadjusted Langevin has a stationary regime, its joint distribution is likely quite far from the joint distribution of a proper discretisation. Now this also made me think about a parameterised version in the 1996 paper spirit, but there is nothing specific about MALA that would prevent the implementation of the general principle. As for the unadjusted version, the joint distribution is directly available.  (But not necessarily the marginals.)

Personally, I think the most interesting part is the practical performance gain in terms of estimation accuracy for fixed CPU time, combined with the convergence guarantee from the CLT. ULA was particularly important to us because of the papers of Arnak Dalalyan, Alain Durmus & Eric Moulines and recently from Mike Jordan’s group, which all look at an unadjusted Langevin diffusion (and unimodal target distributions). But MALA admits a Metropolis-Hastings importance sampling estimator, just as Random Walk Metropolis does – we didn’t include MALA in the experiments to not get people confused with MALA and ULA. But there is no delicacy involved whatsoever in approximating the marginal MALA proposal distribution. The beauty of our approach is that it works for almost all Metropolis-Hastings algorithms where you can evaluate the proposal density q, there is no constraint to use random walks at all (we will emphasize this more in the paper).

in the street for a year

Posted in Mountains, pictures, Travel, University life with tags , , , , , , , , , , on April 13, 2018 by xi'an

Just like about every year, I sent two of my pictures to the photography competition of Paris Dauphine, with not much consideration for the theme “green the future”, and was hence quite surprised to get selected this time! (Almost as much surprised as last year when an almost perfect copy of my picture of the Alcazar Baths of Lady María de Padilla got selected!) As I could travel back from Oxford to attend the opening ceremony, I went there last night, wondering at which of my pictures had been selected (Lac Pavin, Auvergne versus the Quinrang, Skye)…

And so this picture will remain exposed in the street, boulevard Lannes, for the incoming year, meaning I will cross it each time I bike to the university! The 22 other pictures were more in tune with the theme of a green future, like the winning one of a fast moving métro carriage at the station Chemin Vert. Or this simple blade of grass growing from ashes…

And thus the winner is… Continue reading

accelerating MCMC

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on April 11, 2018 by xi'an

As forecasted a rather long while ago (!), I wrote a short and incomplete survey on some approaches to accelerating MCMC. With the massive help of Victor Elvira (Lille), Nick Tawn (Warwick) and Changye Wu (Dauphine). Survey which current version just got arXived and which has now been accepted by WIREs Computational Statistics. The typology (and even the range of methods) adopted here is certainly mostly arbitrary, with suggestions for different divisions made by a very involved and helpful reviewer. While we achieved a quick conclusion to the review process, suggestions and comments are most welcome! Even if we cannot include every possible suggestion, just like those already made on X validated. (WIREs stands for Wiley Interdisciplinary Reviews and its dozen topics cover several fields, from computational stats to biology, to medicine, to engineering.)