**T**he last day of the X fertilisation workshop at the casa matematicà Oaxaca, there were only three talks and only half of the participants. I lost the subtleties of the first talk by Andrea Agazzi on large deviations for chemical reactions, due to an emergency at work (Warwick). The second talk by Igor Barahona was somewhat disconnected from the rest of the conference, working on document textual analysis by way of algebraic data analysis (analyse des données) methods à la Benzécri. (Who was my office neighbour at Jussieu in the early 1990s.) In the last and final talk, Eric Vanden-Eijden made a link between importance sampling and PDMP, as an integral can be expressed via a trajectory of a path. A generalisation of path sampling, for almost any ODE. But also a competitor to nested sampling, waiting for the path to reach an Hamiltonian level, without some of the difficulties plaguing nested sampling like resampling. And involving continuous time processes. (Is there a continuous time version of ABC as well?!) Returning unbiased estimators of mean (the original integral) and variance. Example of a mixture example in dimension d=10 with k=50 components using only 100 paths.

## Archive for computational statistics

## computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags 18w5023, Benzécri, BIRS, Casa Matemática Oaxaca, CMO, computational statistics, HMC, Jussieu, Mexico, molecular dynamics, Monte Carlos Statistical Methods, nested sampling, numerical integrator, path sampling, workshop on November 19, 2018 by xi'an## computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags 18w5023, BIRS, Casa Matemática Oaxaca, CMO, computational statistics, HMC, hypocoercivity, Institut Henri Poincaré, Mexico, molecular dynamics, Monte Carlos Statistical Methods, overdamped Langevin algorithm, PDMP, workshop on November 16, 2018 by xi'an**T**his Thursday, our X fertilisation workshop at the interface between molecular dynamics and Monte Carlo statistical methods saw a wee bit of reduction in the audience as some participants had already left Oaxaca. Meaning they missed the talk of Christophe Andrieu on hypocoercivity which could have been another hand-on lecture, given the highly pedagogical contents of the talk. I had seen some parts of the talk in MCqMC 2018 in Rennes and at NUS, but still enjoyed the whole of it very much, and so did the audience given the induced discussion. For instance, previously, I had not seen the connection between the guided random walks of Gustafson and Diaconis, and continuous time processes like PDMP. Which Christophe also covered in his talk. (Also making me realise my colleague Jean Dolbeault in Dauphine was strongly involved in the theoretical analysis of PDMPs!) Then Samuel Power gave another perspective on PDMPs. With another augmentation, connected with time, what he calls trajectorial reversibility. This has the impact of diminishing the event rate, but creates some kind of reversibility which seems to go against the motivation for PDMPs. (Remember that all talks are available as videos on the BIRS webpage.) A remark in the talk worth reiterating is the importance of figuring out which kinds of approximations are acceptable in these approximations. Connecting somewhat with the next talk by Luc Rey-Bellet on a theory of robust approximations. In the sense of Poincaré, Gibbs, Bernstein, &tc. concentration inequalities and large deviations. With applications to rare events.The fourth and final “hand-on” session was run by Miranda Holmes-Certon on simulating under constraints. Motivated by research on colloids. For which the overdamp Langevin diffusion applies as an accurate model, surprisingly. Which makes a major change from the other talks [most of the workshop!] relying on this diffusion. (With an interesting intermede on molecular velcro made of DNA strands.) Connected with this example, exotic energy landscapes are better described by hard constraints. (Potentially interesting extension to the case when there are too many constraints to explore all of them?) Now, the definition of the measure projected on the manifold defined by the constraints is obviously an important step in simulating the distribution, which density is induced by the gradient of the constraints ∇q(x). The proposed algorithm is in the same spirit as the one presented by Tony the previous day, namely moving along the tangent space then on the normal space to get back to the manifold. A solution that causes issues when the gradient is (near) zero. A great hand-on session which induced massive feedback from the audience.

In the afternoon session, Gersende Fort gave a talk on a generalisation of the Wang-Landau algorithm, which modifies the true weights of the elements of a partition of the sampling space, to increase visits to low [probability] elements and jumps between modes. The idea is to rely on tempered versions of the original weights, learned by stochastic approximation. With an extra layer of adaptivity. Leading to an improvement with parameters that depends on the phase of the stochastic approximation. The second talk was by David Sanders on a recent paper in *Chaos* about importance sampling for rare events of (deterministic) billiard dynamics. With diffusive limits which tails are hard to evaluate, except by importance sampling. And the last talk of the day was by Anton Martinsson on simulated tempering for a molecular alignment problem. With weights of different temperatures proportional to the inverse of the corresponding normalising constants, which themselves can be learned by a form of bridge sampling if I got it right.

On a very minor note, I heard at breakfast a pretty good story from a fellow participant having to give a talk at a conference that was moved to a very early time in the morning due to an official appearing at a later time and as a result “enjoying” a very small audience to the point that a cleaning lady appeared and started cleaning the board as she could not conceive the talks had already started! Reminding me of this picture at IHP.

## computational statistics and molecular simulation [18w5023]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags 18w5023, ABC, BIRS, Casa Matemática Oaxaca, CMO, computational statistics, crown of thorns, gerrymandering, HMC, killer robot, lead climbing, leapfrog integrator, Mexico, misspecified model, molecular dynamics, Monte Carlos Statistical Methods, Moreau-Yoshida, numerical integrator, overdamped Langevin algorithm, proximal optimisation, reversible jump MCMC, rock climbing, starfish, summary statistics, transferability, workshop on November 15, 2018 by xi'an **I** truly missed the gist of the first talk of the Wednesday morning of our X fertilisation workshop by Jianfeng Lu partly due to notations, although the topic very much correlated to my interests like path sampling, with an augmented version of HMC using an auxiliary indicator. And mentions made of BAOAB. Next, Marcello Pereyra spoke about Bayesian image analysis, with the difficulty of setting a prior on an image. In case of astronomical images there are motivations for an L¹ penalisation sparse prior. Sampling is an issue. Moreau-Yoshida proximal optimisation is used instead, in connection with our MCMC survey published in Stats & Computing two years ago. *Transferability* was a new concept for me, as introduced by Kerrie Mengersen (QUT), to extrapolate an estimated model to another system without using the posterior as a prior. With a great interlude about the crown of thorns starfish killer robot! Rather a prior determination based on historical data, in connection with recent (2018) Technometrics and Bayesian Analysis papers towards rejecting non-plausible priors. Without reading the papers (!), and before discussing the matter with Kerrie, here or in Marseille, I wonder at which level of precision this can be conducted. The use of summary statistics for prior calibration gave the approach an ABC flavour.

The hand-on session was Jonathan Mattingly’s discussion of gerrymandering reflecting on his experience at court! Hard to beat for an engaging talk reaching between communities. As it happens I discussed the original paper last year. Of course it was much more exciting to listen to Jonathan explaining his vision of the problem! Too bad I “had” to leave before the end for a [most enjoyable] rock climbing afternoon… To be continued at the dinner table! (Plus we got the complete explanation of the term gerrymandering, including this salamander rendering of the first identified as gerrymandered district!)

## computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags 18w5023, Banff, Banff International Research Station for Mathematical Innovation, BIRS, bouncy particle sampler, Casa Matemática Oaxaca, CMO, computational statistics, Donsker-Varadhan, eigenvalue, local scaling, Mars, Mexico, molecular dynamics, Monte Alban, Monte Carlo Statistical Methods, optimal acceptance rate, PDMP, spectroscopy, tempering, Université Paris Dauphine, workshop, Zapotec civilization, Zig-Zag on November 14, 2018 by xi'an**O**n Day 2, Carsten Hartmann used a representation of the log cumulant as solution to a minimisation problem over a collection of importance functions (by the Vonsker-Varadhan principle), with links to X entropy and optimal control, a theme also considered by Alain Dunmus when considering the uncorrected discretised Langevin diffusion with a decreasing sequence of discretisation scale factors (Jordan, Kinderlehrer and Otto) in the spirit of convex regularisation à la Rockafellar. Also representing ULA as an inexact gradient descent algorithm. Murray Pollock (Warwick) presented a new technique called fusion to simulate from products of d densities, as in scalable MCMC (but not only). With an (early) starting and startling remark that when simulating one realisation from each density in the product and waiting for all of them to be equal means simulating from the product, in a strong link to the (A)BC fundamentals. This is of course impractical and Murray proposes to follow d Brownian bridges all ending up in the average of these simulations, constructing an acceptance probability that is computable and validating the output.

The second “hand-on” lecture was given by Gareth Roberts (Warwick) on the many aspects of scaling MCMC algorithms, which started with the famous 0.234 acceptance rate paper in 1996. While I was aware of some of these results (!), the overall picture was impressive, including a notion of complexity I had not seen before. And a last section on PDMPs where Gareth presented very recent on the different scales of convergence of Zigzag and bouncy particle samplers, mostly to the advantage of Zigzag.In the afternoon, Jeremy Heng presented a continuous time version of simulated tempering by adding a drift to the Langevin diffusion with time-varying energy, which must be solution to the Liouville pde . Which connects to a flow transport problem when solving the pde under additional conditions. Unclear to me was the creation of the infinite sequence. This talk was very much at the interface in the spirit of the workshop! (Maybe surprisingly complex when considering the endpoint goal of simulating from a given target.) Jonathan Weare’s talk was about quantum chemistry which translated into finding eigenvalues of an operator. Turning in to a change of basis in a inhumanly large space (10¹⁸⁰ dimensions!). Matt Moore presented the work on Raman spectroscopy he did while a postdoc at Warwick, with an SMC based classification of the peaks of a spectrum (to be used on Mars?) and Alessandra Iacobucci (Dauphine) showed us the unexpected thermal features exhibited by simulations of chains of rotors subjected to both thermal and mechanical forcings, which we never discussed in Dauphine beyond joking on her many batch jobs running on our cluster!

And I remembered today that there is currently and in parallel another BIRS workshop on statistical model selection [and a lot of overlap with our themes] taking place in Banff! With snow already there! Unfair or rather #unfair, as someone much too well-known would whine..! Not that I am in a position to complain about the great conditions here in Oaxaca (except for having to truly worry about stray dogs rather than conceptually about bears makes running more of a challenge, if not the altitude since both places are about the same).

## computational statistics and molecular simulation [18w5023]

Posted in Statistics with tags 18w5023, BIRS, Casa Matemática Oaxaca, CMO, computational statistics, HMC, leapfrog integrator, Mexico, misspecified model, molecular dynamics, Monte Carlos Statistical Methods, numerical integrator, overdamped Langevin algorithm, reversible jump MCMC, workshop on November 13, 2018 by xi'an**T**his X fertilisation workshop Gabriel Stolz, Luke Bornn and myself organised towards reinforcing the interface between molecular dynamics and Monte Carlo statistical methods has now started! At the casa matematicà Oaxaca, the Mexican campus of BIRS, which is currently housed by a very nice hotel on the heights of Oaxaca. And after a fairly long flight for a large proportion of the participants. On the first day, Arthur Voter gave a fantastic “hand-on” review of molecular dynamics for material sciences, which was aimed at the statistician side of the audience and most helpful in my own understanding of the concepts and techniques at the source of HMC and PDMP algorithms. (Although I could not avoid a few mini dozes induced by jetlag.) Including the BAOAB version of HMC, which sounded to me like an improvement to investigate. The part on metastability, completed by a talk by Florian Maire, remained a wee bit mysterious [to me].

The shorter talks of the day all brought new perspectives and information to me (although they were definitely more oriented towards their “own” side of the audience than the hand-on lecture). For instance, Jesús María Sanz-Serna gave a wide ranging overview of numerical integrators and Tony Lelièvre presented a recent work on simulating measures supported by manifolds via an HMC technique constantly projecting over the manifold, with proper validation. (I had struggled with the paper this summer and this talk helped a lot.) There was a talk by Josh Fash on simulating implicit solvent models that mixed high-level programming and reversible jump MCMC, with an earlier talk by Yong Chen on variable dimension hidden Markov models that could have also alluded to reversible jump. Angela Bito talked about using ASIS (Ancillarity-sufficiency interweaving strategy) for improving the dynamics of an MCMC sampler associated with a spike & slab prior, the recentering-decentering cycle being always a sort of mystery to me [as to why it works better despite introducing multimodality in this case], and Gael Martin presented some new results on her on-going work with David Frazier about approximate Bayes with misspecified models, with the summary statistic being a score function that relates the work to the likelihood free approach of Bissiri et al.

## barbed WIREs

Posted in Books, Kids, University life with tags commercial editor, computational statistics, John Wiley, managing editor, revision, WIREs, WIREs Computational Statistics on July 14, 2018 by xi'an

**M**aybe childishly, I am fairly unhappy with the way the submission of our Accelerating MCMC review was handled by WIREs Computational Statistics, i.e., Wiley, at the production stage. For some reason, or another, I sent the wrong bibTeX file with my LaTeX document [created using the style file imposed by WIREs]. Rather than pointing out the numerous missing entries, the production staff started working on the paper and sent us a proof with an endless list of queries related to these missing references. When I sent back the corrected LaTeX and bibTeX files, it answered back that it was too late to modify the files as it would “require re-work of [the] already processed paper which is also not a standard process for the journal”. Meaning in clearer terms that Wiley does not want to pay any additional time spent on this paper and that I have to provide from my own “free” time to make up for this mess…