Archive for Zig-Zag

day five at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 4, 2022 by xi'an

Woke up even earlier today! Which left me time to work on switching to Leonard Cohen’s song titles for my slide frametitles this afternoon (last talk of the whole conference!), run once again to Mon(t) Royal as all pools are closed (Happy Canada Day!, except to “freedom convoy” antivaxxxers.) Which led to me meeting a raccoon by the side of the path (and moroons feeding wildlife).

Had an exciting time at the morning session, where Giacomo Zanella (formerly Warwick) talked on a mixture approach to leave-one-out predictives, with pseudo-harmonic mean representation, averaging inverse density across all observations. Better than harmonic? Some assumptions allow for finite variance, although I am missing the deep argument (in part due to Giacomo’s machine-gun delivery pace!) Then Alicia Corbella (Warwick) presented a promising entry into PDMP by proposing an automated zig-zag sampler. Pointing out on the side to Joris Bierkens’ webpage on the state-of-the-art PDMP methodology. In this approach, joint with with my other Warwick colleagues Simon Spencer and Gareth Roberts, the zig-zag sampler relies on automatic differentiation and sub-sampling and bound derivation, with “no further information on the target needed”. And finaly Chris Carmona presented a joint work with Geoff Nicholls that is merging merging cut posteriors and variational inference to create a meta posterior. Work and talk were motivated by a nice medieval linguistic problem where the latent variables impact the (convergence of the) MCMC algorithm [as in our k-nearest neighbour experience]. Interestingly using normalising [neural spline] flows. The pseudo-posterior seems to depend very much on their modularization rate η, which penalises how much one module influences the next one.

In the aft, I attended sort of by chance [due to a missing speaker in the copula session] to the end of a session on migration modelling, with a talk by Jason Hilton and Martin Hinsch focussing on the 2015’s mass exodus of Syrians through the Mediterranean,  away from the joint evils of al-Hassad and ISIS. As this was a tragedy whose modelling I had vainly tried to contribute to, I was obviously captivated and frustrated (leaning of the IOM missing migrant project!) Fitting the agent-based model was actually using ABC, and most particularly our ABC-PMC!!!

My own and final session had Gareth (Warwick) presenting his recent work with Jun Yang and Kryzs Łatuszyński (Warwick) on the stereoscopic projection improvement over regular MCMC, which involves turning the target into a distribution supported by an hypersphere and hence considering a distribution with compact support and higher efficiency. Kryzs had explained the principle while driving back from Gregynog two months ago. The idea is somewhat similar to our origaMCMC, which I presented at MCqMC 2016 in Stanford (and never completed), except our projection was inside a ball. Looking forward the adaptive version, in the making!

And to conclude this subjective journal from the ISBA conference, borrowing this title by (Westmount born) Leonard Cohen, “Hey, that’s not a way to say goodbye”… To paraphrase Bilbo Baggins, I have not interacted with at least half the participants half as much as I would have liked. But this was still a reunion, albeit in the new Normal. Hopefully, the conference will not have induced a massive COVID cluster on top of numerous scientific and social exchanges! The following days will tell. Congrats to the ISBA 2022 organisers for achieving a most successful event in these times of uncertainty. And looking forward the 2024 next edition in Ca’Foscari, Venezia!!!

 

practical PDMP

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on December 9, 2021 by xi'an

While in Warwick, last month, I attended a reading group on PDMPs where Filippo Pagani talked about practical PDMP, connected with a recent arXival by Bertazzi, Bierkens and Dobson. The central question when implementing PDMP is to find a realistic way of solving

\int_0^\tau \lambda(x+tv,v)\text dt = \epsilon\quad\epsilon\sim\mathcal Exp(1)

to decide on the stopping time (when the process ceases to be deterministic). The usual approach is to use Poisson thinning by finding an upper bound on λ, but this is either difficult or potentially inefficient (and sometimes both).

“finding a sharp bound M(s) [for Poisson thinning] can be an extremely challenging problem in most practical settings (…) In order to overcome this problem, we introduce discretisation schemes for PDMPs which make their
approximate simulation possible.”

Some of the solutions proposed in Bertazzi et al. are relying on

  1. using a frozen (fixed) λ
  2. discretising time and the integral (first order scheme)
  3. allowing for more than a jump over a time interval (higher order schemes)
  4. going through control variates (when gradient is Lipschitz and Hessian bounded, with known constants) as it produces a linear rate λ
  5. subsampling (at least for Zig Zag)

with theoretical guarantees that the approximations are convergent, as the time step goes to zero. They (almost obviously) remain model dependent solutions (with illustrations for the Zig Zag and bouncy particle versions), with little worse case scenarios, but this is an extended investigation into making PDMPs more manageable!

ISBA 2021.3

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 1, 2021 by xi'an

Now on the third day which again started early with a 100% local j-ISBA session. (After a group run to and around Mont Puget, my first real run since 2020!!!) With a second round of talks by junior researchers from master to postdoc level. Again well-attended. A talk about Bayesian non-parametric sequential taxinomy by Alessandro Zito used the BayesANT acronym, which reminded me of the new vave group Adam and the Ants I was listening to forty years ago, in case they need a song as well as a logo! (Note that BayesANT is also used for a robot using Bayesian optimisation!) And more generally a wide variety in the themes. Thanks to the j-organisers of this 100% live session!

The next session was on PDMPs, which I helped organise, with Manon Michel speaking from Marseille, exploiting the symmetry around the gradient, which is distribution-free! Then, remotely, Kengo Kamatani, speaking from Tokyo, who expanded the high-dimensional scaling limit to the Zig-Zag sampler, exhibiting an argument against small refreshment rates, and Murray Pollock, from Newcastle, who exposed quite clearly the working principles of the Restore algorithm, including why coupling from the past was available in this setting. A well-attended session despite the early hour (in the USA).

Another session of interest for me [which I attended by myself as everyone else was at lunch in CIRM!] was the contributed C16 on variational and scalable inference that included a talk on hierarchical Monte Carlo fusion (with my friends Gareth and Murray as co-authors), Darren’s call to adopt functional programming in order to save Bayesian computing from extinction, normalising flows for modularisation, and Dennis’ adversarial solutions for Bayesian design, avoiding the computation of the evidence.

Wes Johnson’s lecture was about stories with setting prior distributions based on experts’ opinions. Which reminded me of the short paper Kaniav Kamary and myself wrote about ten years ago, in response to a paper on the topic in the American Statistician. And could not understand the discrepancy between two Bayes factors based on Normal versus Cauchy priors, until I was told they were mistakenly used repeatedly.

Rushing out of dinner, I attended both the non-parametric session (live with Marta and Antonio!) and the high-dimension computational session on Bayesian model choice (mute!). A bit of a schizophrenic moment, but allowing to get a rough picture in both areas. At once. Including an adaptive MCMC scheme for selecting models by Jim Griffin. Which could be run directly over the model space. With my ever-going wondering at the meaning of neighbour models.

the buzz about nuzz

Posted in Books, Mountains, pictures, Statistics with tags , , , , , , , , , , , , , on April 6, 2020 by xi'an

“…expensive in these terms, as for each root, Λ(x(s),v) (at the cost of one epoch) has to be evaluated for each root finding iteration, for each node of the numerical integral

When using the ZigZag sampler, the main (?) difficulty is in producing velocity switch as the switches are produced as interarrival times of an inhomogeneous Poisson process. When the rate of this process cannot be integrated out in an analytical manner, the only generic approach I know is in using Poisson thinning, obtained by finding an integrable upper bound on this rate, generating from this new process and subsampling. Finding the bound is however far from straightforward and may anyway result in an inefficient sampler. This new paper by Simon Cotter, Thomas House and Filippo Pagani makes several proposals to simplify this simulation, Nuzz standing for numerical ZigZag. Even better (!), their approach is based on what they call the Sellke construction, with Tom Sellke being a probabilist and statistician at Purdue University (trivia: whom I met when spending a postdoctoral year there in 1987-1988) who also wrote a fundamental paper on the opposition between Bayes factors and p-values with Jim Berger.

“We chose as a measure of algorithm performance the largest Kolmogorov-Smirnov (KS) distance between the MCMC sample and true distribution amongst all the marginal distributions.”

The practical trick is rather straightforward in that it sums up as the exponentiation of the inverse cdf method, completed with a numerical resolution of the inversion. Based on the QAGS (Quadrature Adaptive Gauss-Kronrod Singularities) integration routine. In order to save time Kingman’s superposition trick only requires one inversion rather than d, the dimension of the variable of interest. This nuzzled version of ZIgZag can furthermore be interpreted as a PDMP per se. Except that it retains a numerical error, whose impact on convergence is analysed in the paper. In terms of Wasserstein distance between the invariant measures. The paper concludes with a numerical comparison between Nuzz and random walk Metropolis-Hastings, HMC, and manifold MALA, using the number of evaluations of the likelihood as a measure of time requirement. Tuning for Nuzz is described, but not for the competition. Rather dramatically the Nuzz algorithm performs worse than this competition when counting one epoch for each likelihood computation and better when counting one epoch for each integral inversion. Which amounts to perfect inversion, unsurprisingly. As a final remark, all models are more or less Normal, with very smooth level sets, maybe not an ideal range

 

BayesComp’20

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

First, I really have to congratulate my friend Jim Hobert for a great organisation of the meeting adopting my favourite minimalist principles (no name tag, no “goodies” apart from the conference schedule, no official talks). Without any pretense at objectivity, I also appreciated very much the range of topics and the sweet frustration of having to choose between two or three sessions each time. Here are some notes taken during some talks (with no implicit implication for the talks no mentioned, re. above frustration! as well as very short nights making sudden lapse in concentration highly likely).

On Day 1, Paul Fearnhead’s inaugural plenary talk was on continuous time Monte Carlo methods, mostly bouncy particle and zig-zag samplers, with a detailed explanation on the simulation of the switching times which likely brought the audience up to speed even if they had never heard of them. And an opening on PDMPs used as equivalents to reversible jump MCMC, reminding me of the continuous time (point process) solutions of Matthew Stephens for mixture inference (and of Preston, Ripley, Møller).

The same morn I heard of highly efficient techniques to handle very large matrices and p>n variables selections by Akihiko Nishimura and Ruth Baker on a delayed acceptance ABC, using a cheap proxy model. Somewhat different from indirect inference. I found the reliance on ESS somewhat puzzling given the intractability of the likelihood (and the low reliability of the frequency estimate) and the lack of connection with the “real” posterior. At the same ABC session, Umberto Picchini spoke on a joint work with Richard Everitt (Warwick) on linking ABC and pseudo-marginal MCMC by bootstrap. Actually, the notion of ABC likelihood was already proposed as pseudo-marginal ABC by Anthony Lee, Christophe Andrieu and Arnaud Doucet in the discussion of Fearnhead and Prangle (2012) but I wonder at the focus of being unbiased when the quantity is not the truth, i.e. the “real” likelihood. It would seem more appropriate to attempt better kernel estimates on the distribution of the summary itself. The same session also involved David Frazier who linked our work on ABC for misspecified models and an on-going investigation of synthetic likelihood.

Later, there was a surprise occurrence of the Bernoulli factory in a talk by Radu Herbei on Gaussian process priors with accept-reject algorithms, leading to exact MCMC, although the computing implementation remains uncertain. And several discussions during the poster session, incl. one on the planning of a 2021 workshop in Oaxaca centred on objective Bayes advances as we received acceptance of our proposal by BIRS today!

On Day 2, David Blei gave a plenary introduction to variational Bayes inference and latent Dirichlet allocations, somewhat too introductory for my taste although other participants enjoyed this exposition. He also mentioned a recent JASA paper on the frequentist consistency of variational Bayes that I should check. Speaking later with PhD students, they really enjoyed this opening on an area they did not know that well.

A talk by Kengo Kamatani (whom I visited last summer) on improved ergodicity rates for heavy tailed targets and Crank-NIcholson modifications to the random walk proposal (which uses an AR(1) representation instead of the random walk). With the clever idea of adding the scale of the proposal as an extra parameter with a prior of its own. Gaining one order of magnitude in the convergence speed (i.e. from d to 1 and from d² to d, where d is the dimension), which is quite impressive (and just published in JAP).Veronica Rockova linked Bayesian variable selection and machine learning via ABC, with conditions on the prior for model consistency. And a novel approach using part of the data to learn an ABC partial posterior, which reminded me of the partial  Bayes factors of the 1990’s although it is presumably unrelated. And a replacement of the original rejection ABC via multi-armed bandits, where each variable is represented by an arm, called ABC Bayesian forests. Recalling the simulation trick behind Thompson’s approach, reproduced for the inclusion or exclusion of variates and producing a fixed estimate for the (marginal) inclusion probabilities, which makes it sound like a prior-feeback form of empirical Bayes. Followed by a talk of Gregor Kastner on MCMC handling of large time series with specific priors and a massive number of parameters.

The afternoon also had a wealth of exciting talks and missed opportunities (in the other sessions!). Which ended up with a strong if unintended French bias since I listened to Christophe Andrieu, Gabriel Stolz, Umut Simsekli, and Manon Michel on different continuous time processes, with Umut linking GANs, multidimensional optimal transport, sliced-Wasserstein, generative models, and new stochastic differential equations. Manon Michel gave a highly intuitive talk on creating non-reversibility, getting rid of refreshment rates in PDMPs to kill any form of reversibility.

%d bloggers like this: