## Archive for PDMP

## coordinate sampler on-line

Posted in Statistics with tags coordinate sampler, Gibbs sampler, MCMC, non-reversible diffusion, PDMP, piecewise deterministic, Statistics & Computing on March 13, 2020 by xi'an## non-reversibility in discrete spaces

Posted in Books, Statistics, University life with tags birth-and-death process, coordinate sampler, JASA, jump process, Markov chain, non-reversible diffusion, PDMP, Peskun ordering, reversibility, Zig-Zag on January 3, 2020 by xi'an**F**ollowing a recent JASA paper by Giacomo Zanella (which I have not yet read but is discussed on this blog), Sam Power and Jacob Goldman have recently arXived a paper on Accelerated sampling on discrete spaces with non-reversible Markov processes, where they use continuous-time, non-reversible algorithms à la PDMP, even though differential equations do not exist on discrete spaces. More specifically, they devise discrete versions of the coordinate sampler and of the Zig-Zag sampler, using Markov jump processes instead of differential equations, with detailed balance on the jump rate rather than the Markov kernel. A use of jump processes originating at least from Peskun (1973) and connected with MCMC algorithms in Matthew Stephens‘ 1999 PhD thesis. A neat thing about discrete settings is that the jump process can be implemented with no discretisation! However, as we noticed when working on birth-and-death processes with Olivier Cappé and Tobias Rydèn, there is a potential for disastrous implementation if an infinite sequence of instantaneous moves (out of zero probability states) is proposed.

The authors make the further assumption(s) that the discrete space is endowed with a graphical structure with a group G acting upon this graph, with an involution keeping the target (or a completion of the original target) invariant. In this framework, reversibility amounts to repeatedly using (group) generators þ with a low order (as in Bayesian variable selection, binary spin systems, where þ.þ=id, and other permutation problems), since they bring the chain back to its starting point. Their first sampler is called a Tabu sampler for avoiding such behaviour, forcing the next step to use other generators þ in the generator set Þ thanks to a binary auxiliary variable that partitions Þ into forward vs backward moves. For high order generators, the discrete coordinate and Zig-Zag samplers are instead repeatedly using the same generator (although it is unclear to me why this is beneficial, given that neither graph nor generator is not necessarily linked with the target). With the coordinate sampler being again much cheaper since it only looks at one direction in the generator group.

The paper contains a range of comparisons with (only) Zanella’s sampler, some presenting heavy gains in terms of ESS. Including one on hundreds of sensors in a football stadium. As I am not particularly familiar with these examples, except for the Bayesian variable selection one, I found it rather hard to determine whether or not the compared samplers were indeed exploring the entirety of the (highly complex and highly dimensional) target. The collection of examples is however quite rich and support the use of such non-reversible schemes. It may also be that the discrete nature of the target could facilitate the theoretical study of their convergence properties.

## sampling and imbalanced

Posted in Statistics with tags big data, importance sampling, logistic regression, PDMP, Poisson process, Zig-Zag on June 21, 2019 by xi'an**D**eborshee Sen, Matthias Sachs, Jianfeng Lu and David Dunson have recently arXived a sub-sampling paper for classification (logistic) models where some covariates or some responses are imbalanced. With a PDMP, namely zig-zag, used towards preserving the correct invariant distribution (as already mentioned in an earlier post on the zig-zag zampler and in a recent Annals paper by Joris Bierkens, Paul Fearnhead, and Gareth Roberts (Warwick)). The current paper is thus an improvement on the above. Using (non-uniform) importance sub-sampling across observations and simpler upper bounds for the Poisson process. A rather practical form of Poisson thinning. And proposing unbiased estimates of the sub-sample log-posterior as well as stratified sub-sampling.

I idly wondered if the zig-zag sampler could itself be improved by not switching the bouncing directions at random since directions associated with almost certainly null coefficients should be neglected as much as possible, but the intensity functions associated with the directions do incorporate this feature. Except for requiring computation of the intensities for all directions. This is especially true when facing many covariates.

Thinking of the logistic regression model itself, it is sort of frustrating that something so close to an exponential family causes so many headaches! Formally, it is an exponential family but the normalising constant is rather unwieldy, especially when there are many observations and many covariates. The Polya-Gamma completion is a way around, but it proves highly costly when the dimension is large…

## scalable Metropolis-Hastings

Posted in Books, Statistics, Travel with tags delayed acceptance, Fukui-Todo procedure, Hamiltonian Monte Carlo, Langevin MCMC algorithm, PDMP, scalable MCMC, scaling, Taylor expansion, thinning, University of Oxford on February 12, 2019 by xi'an**A**mong the flury of arXived papers of last week (414!), including a fair chunk of papers submitted to ICML 2019, I spotted one entry by Cornish et al. on scalable Metropolis-Hastings, which Arnaud Doucet had mentioned to me yesterday when in Oxford. The paper builds on the delayed acceptance paper we wrote with Marco Banterlé, Clara Grazian and Anthony Lee, itself relying on a factorisation decomposition of the likelihood, combined with control variate accelerating techniques. The factorisation of both the target and the proposal allows for a (less efficient) Metropolis-Hastings acceptance ratio that is the product

of individual Metropolis-Hastings acceptance ratios, but which allows for quicker rejection if one of the probabilities in the product is small, because the corresponding Bernoulli draw is zero with high probability. One advance made in Michel et al. (2017) [which I doubly missed] is that subsampling is achievable by thinning (as in PDMPs, where these authors have been quite active) through an algorithm of Shantikumar (1985) [described in Devroye’s bible]. Provided each Metropolis-Hastings probability can be lower bounded:

by a term where the transition *φ* does not depend on the index *i* in the product. The computing cost of the thinning process thus depends on the efficiency of the subsampling, namely whether or not the (Poisson) number of terms is much smaller than m, number of terms in the product. A neat trick in the current paper that extends the the Fukui-Todo procedure is to switch to the original Metropolis-Hastings when the overall lower bound is too small, recovering the geometric ergodicity of this original if it holds (**Theorem 2.1**). Another neat remark is that when using the naïve factorisation as the product of the n individual likelihoods, the resulting algorithm is sort of doomed as n grows, even with an optimal scaling of the proposals. To achieve scalability, the authors introduce a Taylor (i.e., Gaussian) approximation to each local target in the product and start the acceptance decomposition by using the resulting overall Gaussian approximation. Meaning that the remaining product is now made of ratios of targets over their local Taylor approximations, hence most likely close to one. And potentially lower-bounded by the remainder term in the Taylor expansion. Leading to the conclusion that, when everything goes well, meaning that the Taylor expansions can be conducted and the bounds derived for the appropriate expansion, the order of the Poisson scale is O(1/√n)..! The proposal for the Metropolis-Hastings move is actually tuned to the Gaussian approximation, appearing as a variant of the Langevin move or more exactly a discretization of an Hamiltonian move. Obviously, I cannot judge of the complexity in implementing this new scheme from just reading the paper, but this development on the split target is definitely an exciting prospect for handling huge datasets and their friends!

## irreversible Markov chains

Posted in Books, pictures, Statistics, University life with tags Bernie Alder, Ecole Normal Supérieure, factorised Metropolis, Nicolas Metropolis, PDMP, phase transition, prefetching, seminar, torus, Université Paris Dauphine on November 20, 2018 by xi'anWerner Krauth (ENS, Paris) was in Dauphine today to present his papers on irreversible Markov chains at the probability seminar. He went back to the 1953 Metropolis et al. paper. And mentioned a 1962 paper I had never heard of by Alder and Wainwright demonstrating phase transition can occur, via simulation. The whole talk was about simulating the stationary distribution of a large number of hard spheres on a one-dimensional ring, which made it hard for me to understand. (Maybe the triathlon before did not help.) And even to realise a part was about PDMPs… His slides included this interesting entry on factorised MCMC which reminded me of delayed acceptance and thinning and prefetching. Plus a notion of lifted Metropolis that could have applications in a general setting, if it differs from delayed rejection.

## computational statistics and molecular simulation [18w5023]

Posted in pictures, Statistics, Travel, University life with tags 18w5023, BIRS, Casa Matemática Oaxaca, CMO, computational statistics, HMC, hypocoercivity, Institut Henri Poincaré, Mexico, molecular dynamics, Monte Carlos Statistical Methods, overdamped Langevin algorithm, PDMP, workshop on November 16, 2018 by xi'an**T**his Thursday, our X fertilisation workshop at the interface between molecular dynamics and Monte Carlo statistical methods saw a wee bit of reduction in the audience as some participants had already left Oaxaca. Meaning they missed the talk of Christophe Andrieu on hypocoercivity which could have been another hand-on lecture, given the highly pedagogical contents of the talk. I had seen some parts of the talk in MCqMC 2018 in Rennes and at NUS, but still enjoyed the whole of it very much, and so did the audience given the induced discussion. For instance, previously, I had not seen the connection between the guided random walks of Gustafson and Diaconis, and continuous time processes like PDMP. Which Christophe also covered in his talk. (Also making me realise my colleague Jean Dolbeault in Dauphine was strongly involved in the theoretical analysis of PDMPs!) Then Samuel Power gave another perspective on PDMPs. With another augmentation, connected with time, what he calls trajectorial reversibility. This has the impact of diminishing the event rate, but creates some kind of reversibility which seems to go against the motivation for PDMPs. (Remember that all talks are available as videos on the BIRS webpage.) A remark in the talk worth reiterating is the importance of figuring out which kinds of approximations are acceptable in these approximations. Connecting somewhat with the next talk by Luc Rey-Bellet on a theory of robust approximations. In the sense of Poincaré, Gibbs, Bernstein, &tc. concentration inequalities and large deviations. With applications to rare events.The fourth and final “hand-on” session was run by Miranda Holmes-Certon on simulating under constraints. Motivated by research on colloids. For which the overdamp Langevin diffusion applies as an accurate model, surprisingly. Which makes a major change from the other talks [most of the workshop!] relying on this diffusion. (With an interesting intermede on molecular velcro made of DNA strands.) Connected with this example, exotic energy landscapes are better described by hard constraints. (Potentially interesting extension to the case when there are too many constraints to explore all of them?) Now, the definition of the measure projected on the manifold defined by the constraints is obviously an important step in simulating the distribution, which density is induced by the gradient of the constraints ∇q(x). The proposed algorithm is in the same spirit as the one presented by Tony the previous day, namely moving along the tangent space then on the normal space to get back to the manifold. A solution that causes issues when the gradient is (near) zero. A great hand-on session which induced massive feedback from the audience.

In the afternoon session, Gersende Fort gave a talk on a generalisation of the Wang-Landau algorithm, which modifies the true weights of the elements of a partition of the sampling space, to increase visits to low [probability] elements and jumps between modes. The idea is to rely on tempered versions of the original weights, learned by stochastic approximation. With an extra layer of adaptivity. Leading to an improvement with parameters that depends on the phase of the stochastic approximation. The second talk was by David Sanders on a recent paper in *Chaos* about importance sampling for rare events of (deterministic) billiard dynamics. With diffusive limits which tails are hard to evaluate, except by importance sampling. And the last talk of the day was by Anton Martinsson on simulated tempering for a molecular alignment problem. With weights of different temperatures proportional to the inverse of the corresponding normalising constants, which themselves can be learned by a form of bridge sampling if I got it right.

On a very minor note, I heard at breakfast a pretty good story from a fellow participant having to give a talk at a conference that was moved to a very early time in the morning due to an official appearing at a later time and as a result “enjoying” a very small audience to the point that a cleaning lady appeared and started cleaning the board as she could not conceive the talks had already started! Reminding me of this picture at IHP.