Archive for gradient algorithm

6th Workshop on Sequential Monte Carlo Methods

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , on May 16, 2024 by xi'an

Very glad to be back to an SMC workshop as it has been nine years since my attending SMC 2015 in Malakoff! The more for the workshop taking place in Edinburgh and at the Bayes Centre. It is one of these places where I feel somewhat returning to familiar grounds with accumulated memories. Like my last visit there when I had a tea with Mike Titterington…

The overall pace of the workshop was quite nice, with long breaks for informal discussions (and time for ‘oggin’!) and interesting poster late afternoons, helped by the small number of them at each instance, incl. one on reversible jump HMC. Here are a few scribbled entries about some talks along the first two days.

After my opening talk (!), Joaquín Míguez talked about the impact of a sequential (Euler-Marayama) discretisation scheme for stochastic differential equations on Bayesian filtering with control of the approximation effect. Axel Finke (in a joint work with Adrien Corenflos, now an ERC Ocean postdoc in Warwick) built a sequence of particle filter algorithms targeting good performances (high expected jumping distance) against both large dimensions and high time horizon, exploiting gradient shift MALA-like, as well as prior impact, with the conclusion that their jack-of-all-trades solutions, Particle­-MALA and Particle­-mGRAD, enjoyed this resistance in nearly normal models. Interesting reminder of the auxiliary particle trick and good insights on using the smoothing target, even when accounting for the computing time, but too many versions for a single talk without checking against the preprint.

The SMC sampler-like algorithm involves propagating N “seed” particles z(i), with a mutation mechanism consisting of the generation of N integrator snippets 𝗓:=(z,ψ⁢(z),ψ²⁢(z),…) started at every seed particle z(i), resulting in N×(T+1) particles which are then whittled down to a set of N seed particles using a standard resampling scheme. Andrieu et al., 2024

Christophe Andrieu talked about Monte Carlo sampling with integrator snippets, starting with recycling solutions for the leapfrog integrator HMC and unfolding Hamiltonians for moving more easily. With snippets representing discretised paths along the level sets being used as particles, picking zero, one, or more particles along each path, since importance weights are connection with multinomial HMC

This relatively small algorithmic modification of the conditional particle filter, which we call the conditional backward sampling particle filter has a dramatically improved performance over the conditional particle filter. Karjalainen et al., 2024

Anthony Lee looked at mixing times for backward sampling SMC (CBPF/ancestor sampling) cf Lee et al. (2020), where the backward step consists in computing the weight of a randomly drawn backward or ancestral history. Improving on earlier results to reach mixing time O(log T) and complexity O(T log T) (with T the time horizon). Thanks to maximal coupling and boundedness assumptions on the prior and likelihood functions.

Neil Chada presented a work on Bayesian multilevel Monte Carlo on deep networks. À la Giles, with a telescoping identity. Always puzzling to envision a prior on all parameters of a neural network. Achieving a computational cost inverse to the order of the MSE, at best. With a useful reminder that pushing the size of the NN to infinity results in a (poor) Gaussian process prior (Sell et al., 2023).

On my first evening, I stopped with a friend in my favourite Blonde [restaurant], as in almost every other visit to Edinburgh, enjoyable as always, but I also found the huge offer of Asian minimarkets in the area too tempting to resist, between Indian, Korean, and Chinese products. (Although with a disappointing hojicha!). As I could not reach any new Munro by train or bus within a reasonable time range I resorted to the nearer Pentland Hills, with a stop by Rosslyn Chapel (mostly of Da Vinci Code fame!, if classic enough). And some delays in finding a bus getting there (misled by google map!) and a trail (misled by my poor map reading skills) up the actual hills. The mist did not help either.

generalizing Hamiltonian Monte Carlo with neural networks

Posted in Statistics with tags , , , on April 25, 2018 by xi'an

Daniel Levy, Matthew Hoffman, and Jascha Sohl-Dickstein pointed out to me a recent paper of theirs submitted to and accepted by ICLR 2018, with the above title. This allowed me to discover the open source handling of paper reviews at ICLR, which I find quite convincing, except for not using MathJax or another medium for LaTeX formulas. And which provides a collection of comments besides mine’s. (Disclaimer: I was not involved in the processing of this paper for ICLR!)

“Ultimately our goal (and that of HMC) is to produce a proposal that mixes efficiently, not to simulate Hamiltonian dynamics accurately.”

The starting concept is the same as GANs (generative adversarial networks) discussed here a few weeks ago. Complemented by a new HMC that also uses deep neural networks to represent the HMC trajectory. (Also seen in earlier papers by e.g. Strathman.) The novelty in the HMC seems to be a binary direction indicator on top of the velocity. The leapfrog integrator is also modified, with a location scale generalisation for the velocity and a half-half location scale move for the original target x. The functions appearing in the location scale aspects are learned by neural nets. Towards minimising lag-one auto-correlation. Plus an extra penalty for not moving enough. Reflecting on the recent MCMC literature and in particular on the presentations at BayesComp last month, judging from comments of participants, this inclusion of neural tools in the tuning of MCMC algorithms sounds like a steady trend in the community. I am slightly at a loss about the adaptive aspects of the trend with regards to the Markovianity of the outcome.

“To compute the Metropolis-Hastings acceptance probability for a deterministic transition, the operator
must be invertible and have a tractable Jacobian.”

A remark (above) that seems to date back at least to Peter Green’s reversible jump. Duly mentioned in the paper. When reading about the performances of this new learning HMC, I could not see where the learning steps for the parameters of the leapfrog operators were accounted for, although the authors mention an identical number of gradient computations (which I take to mean the same thing). One evaluation of this method against earlier ones (Fig.2) checks successive values of the likelihood, which may be intuitive enough but does not necessarily qualify convergence to the right region since the posterior may concentrate away from the maximal likelihood.

One bicycle for two

Posted in R, Running with tags , , on March 23, 2011 by xi'an

Robin showed me a mathematical puzzle today that reminded me of a story my grand-father used to tell. When he was young, he and his cousin were working in the same place and on Sundays they used to visit my great-grand-mother in another village. However, they only had one bicycle between them, so they would share the bicyle by alternating walking and cycling episodes, every 200 meters or so! Robin’s problem is to find the optimal portions of the distance spent by each of both cousins on the bicycle if one knows their respective walking and cycling speeds.

Continue reading