Archive for MALA

the buzz about nuzz

Posted in Books, Mountains, pictures, Statistics with tags , , , , , , , , , , , , , on April 6, 2020 by xi'an

“…expensive in these terms, as for each root, Λ(x(s),v) (at the cost of one epoch) has to be evaluated for each root finding iteration, for each node of the numerical integral

When using the ZigZag sampler, the main (?) difficulty is in producing velocity switch as the switches are produced as interarrival times of an inhomogeneous Poisson process. When the rate of this process cannot be integrated out in an analytical manner, the only generic approach I know is in using Poisson thinning, obtained by finding an integrable upper bound on this rate, generating from this new process and subsampling. Finding the bound is however far from straightforward and may anyway result in an inefficient sampler. This new paper by Simon Cotter, Thomas House and Filippo Pagani makes several proposals to simplify this simulation, Nuzz standing for numerical ZigZag. Even better (!), their approach is based on what they call the Sellke construction, with Tom Sellke being a probabilist and statistician at Purdue University (trivia: whom I met when spending a postdoctoral year there in 1987-1988) who also wrote a fundamental paper on the opposition between Bayes factors and p-values with Jim Berger.

“We chose as a measure of algorithm performance the largest Kolmogorov-Smirnov (KS) distance between the MCMC sample and true distribution amongst all the marginal distributions.”

The practical trick is rather straightforward in that it sums up as the exponentiation of the inverse cdf method, completed with a numerical resolution of the inversion. Based on the QAGS (Quadrature Adaptive Gauss-Kronrod Singularities) integration routine. In order to save time Kingman’s superposition trick only requires one inversion rather than d, the dimension of the variable of interest. This nuzzled version of ZIgZag can furthermore be interpreted as a PDMP per se. Except that it retains a numerical error, whose impact on convergence is analysed in the paper. In terms of Wasserstein distance between the invariant measures. The paper concludes with a numerical comparison between Nuzz and random walk Metropolis-Hastings, HMC, and manifold MALA, using the number of evaluations of the likelihood as a measure of time requirement. Tuning for Nuzz is described, but not for the competition. Rather dramatically the Nuzz algorithm performs worse than this competition when counting one epoch for each likelihood computation and better when counting one epoch for each integral inversion. Which amounts to perfect inversion, unsurprisingly. As a final remark, all models are more or less Normal, with very smooth level sets, maybe not an ideal range


Langevin on a wrong bend

Posted in Books, Statistics with tags , , , , , , , on October 19, 2017 by xi'an

Arnak Dalayan and Avetik Karagulyan (CREST) arXived a paper the other week on a focussed study of the Langevin algorithm [not MALA] when the gradient of the target is incorrect. With the following improvements [quoting non-verbatim from the paper]:

  1. a varying-step Langevin that reduces the number of iterations for a given Wasserstein precision, compared with recent results by e.g. Alan Durmus and Éric Moulines;
  2. an extension of convergence results for error-prone evaluations of the gradient of the target (i.e., the gradient is replaced with a noisy version, under some moment assumptions that do not include unbiasedness);
  3. a new second-order sampling algorithm termed LMCO’, with improved convergence properties.

What is particularly interesting to me in this setting is the use in all these papers of a discretised Langevin diffusion (a.k.a., random walk with a drift induced by the gradient of the log-target) without the original Metropolis correction. The results rely on an assumption of [strong?] log-concavity of the target, with “user-friendly” bounds on the Wasserstein distance depending on the constants appearing in this log-concavity constraint. And so does the adaptive step. (In the case of the noisy version, the bias and variance of the noise also matter. As pointed out by the authors, there is still applicability to scaling MCMC for large samples. Beyond pseudo-marginal situations.)

“…this, at first sight very disappointing behavior of the LMC algorithm is, in fact, continuously connected to the exponential convergence of the gradient descent.”

The paper concludes with an interesting mise en parallèle of Langevin algorithms and of gradient descent algorithms, since the convergence rates are the same.

accelerating Metropolis-Hastings algorithms by delayed acceptance

Posted in Books, Statistics, University life with tags , , , , , , , , on March 5, 2015 by xi'an

Marco Banterle, Clara Grazian, Anthony Lee, and myself just arXived our paper “Accelerating Metropolis-Hastings algorithms by delayed acceptance“, which is an major revision and upgrade of our “Delayed acceptance with prefetching” paper of last June. Paper that we submitted at the last minute to NIPS, but which did not get accepted. The difference with this earlier version is the inclusion of convergence results, in particular that, while the original Metropolis-Hastings algorithm dominates the delayed version in Peskun ordering, the later can improve upon the original for an appropriate choice of the early stage acceptance step. We thus included a new section on optimising the design of the delayed step, by picking the optimal scaling à la Roberts, Gelman and Gilks (1997) in the first step and by proposing a ranking of the factors in the Metropolis-Hastings acceptance ratio that speeds up the algorithm.  The algorithm thus got adaptive. Compared with the earlier version, we have not pursued the second thread of prefetching as much, simply mentioning that prefetching and delayed acceptance could be merged. We have also included a section on the alternative suggested by Philip Nutzman on the ‘Og of using a growing ratio rather than individual terms, the advantage being the probability of acceptance stabilising when the number of terms grows, with the drawback being that expensive terms are not always computed last. In addition to our logistic and mixture examples, we also study in this version the MALA algorithm, since we can postpone computing the ratio of the proposals till the second step. The gain observed in one experiment is of the order of a ten-fold higher efficiency. By comparison, and in answer to one comment on Andrew’s blog, we did not cover the HMC algorithm, since the preliminary acceptance step would require the construction of a proxy to the acceptance ratio, in order to avoid computing a costly number of derivatives in the discretised Hamiltonian integration.

high-dimensional stochastic simulation and optimisation in image processing [day #3]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on August 31, 2014 by xi'an

Last and maybe most exciting day of the “High-dimensional Stochastic Simulation and Optimisation in Image Processing” in Bristol as it was exclusively about simulation (MCMC) methods. Except my own talk on ABC. And Peter Green’s on consistency of Bayesian inference in non-regular models. The talks today were indeed about using convex optimisation devices to speed up MCMC algorithms with tools that were entirely new to me, like the Moreau transform discussed by Marcelo Pereyra. Or using auxiliary variables à la RJMCMC to bypass expensive Choleski decompositions. Or optimisation steps from one dual space to the original space for the same reason. Or using pseudo-gradients on partly differentiable functions in the talk by Sylvain Lecorff on a paper commented earlier in the ‘Og. I particularly liked the notion of Moreau regularisation that leads to more efficient Langevin algorithms when the target is not regular enough. Actually, the discretised diffusion itself may be geometrically ergodic without the corrective step of the Metropolis-Hastings acceptance. This obviously begs the question of an extension to Hamiltonian Monte Carlo. And to multimodal targets, possibly requiring as many normalisation factors as there are modes. So, in fine, a highly informative workshop, with the perfect size and the perfect crowd (which happened to be predominantly French, albeit from a community I did not have the opportunity to practice previously). Massive kudos to Marcello for putting this workshop together, esp. on a week where family major happy events should have kept him at home!

As the workshop ended up in mid-afternoon, I had plenty of time for a long run with Florence Forbes down to the Avon river and back up among the deers of Ashton Court, avoiding most of the rain, all of the mountain bikes on a bike trail that sounded like trail running practice, and building enough of an appetite for the South Indian cooking of the nearby Thali Café. Brilliant!

high-dimensional stochastic simulation and optimisation in image processing [day #2]

Posted in pictures, Statistics, Travel, Uncategorized, University life, Wines with tags , , , , , , on August 30, 2014 by xi'an

After a nice morning run down Leigh Woods and on the muddy banks of the Avon river, I attended a morning session on hyperspectral image non-linear modelling. Topic about which I knew nothing beforehand. Hyperspectral images are 3-D images made of several wavelengths to improve their classification as a mixture of several elements. The non-linearity is due to the multiple reflections from the ground as well as imperfections in the data collection. I found this new setting of clear interest, from using mixtures to exploring Gaussian processes and Hamiltonian Monte Carlo techniques on constrained spaces… Not to mention the “debate” about using Bayesian inference versus optimisation. It was overall a day of discovery as I am unaware of the image processing community (being the outlier in this workshop!) and of their techniques. The problems mostly qualify as partly linear high-dimension inverse problems, with rather standard if sometimes hybrid MCMC solutions. (The day ended even more nicely with another long run in the fields of Ashton Court and a conference diner by the river…)