**D**ue to the new lockdown measures enforced in France and in particular in Marseilles, the CIRM workshop on QMC and randomness has turned virtual, and I will thus give my talk on ** Coordinate sampler : A non-reversible Gibbs-like sampler ** from Paris. Rather than from the Luminy campus after an early morning run to the top of Mont Puget as we used to do on the previous workshop there. With versions of PDMP running on QMC (which makes sense when considering the deterministic component of the sampler).

## Archive for Markov chain Monte Carlo

## away from CIRM

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags calanques, CIRM, coordinate sampler, Gibbs sampler, Luminy campus, Markov chain Monte Carlo, Marseille, Mont Puget, PDMP, SMF on November 5, 2020 by xi'an## Hastings at 50, from a Metropolis

Posted in Kids, pictures, Running, Travel with tags 50 miles, 50 years, Bayesian computation, Biometrika, Channel, East Sussex, H.G. Wells, Hastings, Hauts de France, importance sampling, jatp, Le Touquet Paris-Plage, Markov chain Monte Carlo, Metropolis-Hastings, P.G. Wodehouse, Picardy, posterior sampling, rejection sampling, Saint-Valéry-sur-Somme, sea, tempest, William the Conqueror, winter light on January 4, 2020 by xi'anA weekend trip to the quaint seaside city of Le Touquet Paris-Plage, facing the city of Hastings on the other side of the Channel, 50 miles away (and invisible on the pictures!), during and after a storm that made for a fantastic watch from our beach-side rental, if less for running! The town is far from being a metropolis, actually, but it got its added surname “Paris-Plage” from British investors who wanted to attract their countrymen in the late 1800s. The writers H.G. Wells and P.G. Wodehouse lived there for a while. (Another type of tourist, William the Conqueror, left for Hastings in 1066 from a wee farther south, near Saint-Valéry-sur-Somme.)

And the coincidental on-line publication in Biometrika of a 50 year anniversary paper, *The Hastings algorithm at fifty* by David Dunson and James Johndrow. More of a celebration than a comprehensive review, with focus on scalable MCMC, gradient based algorithms, Hamiltonian Monte Carlo, nonreversible Markov chains, and interesting forays into approximate Bayes. Which makes for a great read for graduate students and seasoned researchers alike!

## Markov Chains [not a book review]

Posted in Books, pictures, Statistics, University life with tags book review, concentration inequalities, coupling, Eric Moulines, irreducibility, Markov chain and stochastic stability, Markov chain Monte Carlo, Markov chains, MCMC convergence, probability theory, Randal Douc, Richard Tweedie, Sean Meyn, Wasserstein distance on January 14, 2019 by xi'an**A**s Randal Douc and Éric Moulines are both very close friends and two authors of this book on Markov chains, I cannot engage into a regular book review! Judging from the table of contents, the coverage is not too dissimilar to the now classic Markov chain Stochastic Stability book by Sean Meyn and the late Richard Tweedie (1994), called the Bible of Markov chains by Peter Glynn, with more emphasis on convergence matters and a more mathematical perspective. The 757 pages book also includes a massive appendix on maths and probability background. As indicated in the preface, “the reason [the authors] thought it would be useful to write a new book is to survey some of the developments made during the 25 years that have elapsed since the publication of Meyn and Tweedie (1993b).” Connecting with the theoretical developments brought by MCMC methods. Like subgeometric rates of convergence to stationarity, sample paths, limit theorems, and concentration inequalities. The book also reflects on the numerous contributions of the authors to the field. Hence a perfect candidate for teaching Markov chains to mathematically well-prepared. graduate audiences. Congrats to the authors!

## a conceptual introduction to HMC

Posted in Books, Statistics with tags adiabatic Monte Carlo, differential geometry, Hamiltonian Monte Carlo, HMC, Markov chain Monte Carlo, MCMC, Monte Carlo Statistical Methods, typical set on September 5, 2017 by xi'an

“…it has proven a empirical success on an incredibly diverse set of target distributions encountered in applied problems.”

**I**n January this year (!), Michael Betancourt posted on arXiv a detailed introduction to Hamiltonian Monte Carlo that recouped some talks of his I attended. Like the one in Boston two years ago. I have (re)read through this introduction to include an HMC section in my accelerating MCMC review for WIREs (which writing does not accelerate very much…)

“…this formal construction is often out of reach of theoretical and applied statisticians alike.”

With the relevant provision of Michael being a friend and former colleague at Warwick, I appreciate the paper at least as much as I appreciated the highly intuitive approach to HMC in his talks. It is not very mathematical and does not provide theoretical arguments for the defence of one solution versus another, but it (still) provides engaging reasons for using HMC.

“One way to ensure computational inefficiency is to waste computational resources evaluating the target density and relevant functions in regions of parameter space that have negligible contribution to the desired expectation.”

The paper starts by insisting on the probabilistic importance of *the typical set*, which amounts to a ring for Gaussian-like distributions. Meaning that in high dimensions the mode of the target is not a point that is particularly frequently visited. I find this notion quite compelling and am at the same time [almost] flabbergasted that I have never heard of it before.

“we will consider only a single parameterization for computing expectations, but we must be careful to ensure that any such computation does not depend on the irrelevant details of that parameterization, such as the particular shape of the probability density function.”

I am not sure I get this sentence. Either it means that an expectation remains invariant under reparameterisation. Or something else and more profound that eludes me. In particular because Michael repeats later (p.25) that the canonical density does not depend on the parameterisation.

“Every choice of kinetic energy and integration time yields a new Hamiltonian transition that will interact differently with a given target distribution (…) when poorly-chosen, however, the performance can suffer dramatically.”

When discussing HMC, Michael tends to get a wee bit overboard with superlatives!, although he eventually points out the need for calibration as in the above quote. The explanation of the HMC move as a combination of uniform moves along isoclines of fixed energy level and of jumps between energy levels does not seem to translate into practical implementations, at least not as explained in the paper. Simulating directly the energy distribution for a complex target distribution does not seem more feasible than moving up likelihood levels in nested sampling. (Unless I have forgotten something essential about HMC!) Similarly, when discussing symplectic integrators, the paper intuitively conveys the reason these integrators avoid Euler’s difficulties, even though one has to seek elsewhere for rigorous explanations. In the end I cannot but agree with the concluding statement that the geometry of the target distribution holds the key to devising more efficient Monte Carlo methods.

## approximations of Markov Chains [another garden of forking paths]

Posted in Books, Mountains, pictures, Statistics, University life with tags approximate MCMC, computational budget, Doeblin's condition, Markov chain Monte Carlo, MCMskv, minimaxity, Monte Carlo Statistical Methods, noisy MCMC, total variation, uniform ergodicity, uniform geometric ergodicity on March 15, 2016 by xi'an**J**ames Johndrow and co-authors from Duke wrote a paper on approximate MCMC that was arXived last August and that I missed. David Dunson‘s talk at MCMski made me aware of it. The paper studies the impact of replacing a valid kernel with a close approximation. Which is a central issue for many usages of MCMC in complex models, as exemplified by the large number of talks on that topic at MCMski.

“All of our bounds improve with the MCMC sample path length at the expected rate in t.”

A major constraint in the paper is Doeblin’s condition, which implies uniform geometric ergodicity. Not only it is a constraint on the Markov kernel but it is also one for the Markov operator in that it may prove impossible to… prove. The second constraint is that the approximate Markov kernel is close enough to the original, which sounds reasonable. Even though one can always worry that the total variation norm is too weak a norm to mean much. For instance, I presume with some confidence that this does not prevent the approximate Markov kernel from not being ergodic, e.g., not irreducible, not absolutely continuous wrt the target, null recurrent or transient. Actually, the assumption is stronger in that there exists a *collection* of approximations for all small enough values ε of the total variation distance. (*Small enough* meaning ε is much smaller than the complement α to 1 of the one step distance between the Markov kernel and the target. With poor kernels, the approximation must thus be *very* good.) This is less realistic than assuming the availability of one single approximation associated with an existing but undetermined distance ε. (For instance, the three examples of Section 3 in the paper show the existence of approximations achieving a certain distance ε, without providing a constructive determination of such approximations.) Under those assumptions, the average of the sequence of Markov moves according to the approximate kernel converges to the target in total variation (and in expectation for bounded functions). With sharp bounds on those distances. I am still a bit worried at the absence of conditions for the approximation to be ergodic.

“…for relatively short path lengths, there should exist a range of values for which aMCMC offers better performance in the compminimax sense.”

The paper also includes computational cost into the picture. Introducing the notion of compminimax error, which is the smallest (total variation) distance among all approximations at a given computational budget. Quite an interesting, innovative, and relevant notion that may however end up being too formal for practical use. And that does not include the time required to construct and calibrate the approximations.