## ISBA 2021.3

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 1, 2021 by xi'an

Now on the third day which again started early with a 100% local j-ISBA session. (After a group run to and around Mont Puget, my first real run since 2020!!!) With a second round of talks by junior researchers from master to postdoc level. Again well-attended. A talk about Bayesian non-parametric sequential taxinomy by Alessandro Zito used the BayesANT acronym, which reminded me of the new vave group Adam and the Ants I was listening to forty years ago, in case they need a song as well as a logo! (Note that BayesANT is also used for a robot using Bayesian optimisation!) And more generally a wide variety in the themes. Thanks to the j-organisers of this 100% live session!

The next session was on PDMPs, which I helped organise, with Manon Michel speaking from Marseille, exploiting the symmetry around the gradient, which is distribution-free! Then, remotely, Kengo Kamatani, speaking from Tokyo, who expanded the high-dimensional scaling limit to the Zig-Zag sampler, exhibiting an argument against small refreshment rates, and Murray Pollock, from Newcastle, who exposed quite clearly the working principles of the Restore algorithm, including why coupling from the past was available in this setting. A well-attended session despite the early hour (in the USA).

Another session of interest for me [which I attended by myself as everyone else was at lunch in CIRM!] was the contributed C16 on variational and scalable inference that included a talk on hierarchical Monte Carlo fusion (with my friends Gareth and Murray as co-authors), Darren’s call to adopt functional programming in order to save Bayesian computing from extinction, normalising flows for modularisation, and Dennis’ adversarial solutions for Bayesian design, avoiding the computation of the evidence.

Wes Johnson’s lecture was about stories with setting prior distributions based on experts’ opinions. Which reminded me of the short paper Kaniav Kamary and myself wrote about ten years ago, in response to a paper on the topic in the American Statistician. And could not understand the discrepancy between two Bayes factors based on Normal versus Cauchy priors, until I was told they were mistakenly used repeatedly.

Rushing out of dinner, I attended both the non-parametric session (live with Marta and Antonio!) and the high-dimension computational session on Bayesian model choice (mute!). A bit of a schizophrenic moment, but allowing to get a rough picture in both areas. At once. Including an adaptive MCMC scheme for selecting models by Jim Griffin. Which could be run directly over the model space. With my ever-going wondering at the meaning of neighbour models.

## general perspective on the Metropolis–Hastings kernel

Posted in Books, Statistics with tags , , , , , , , , , , , , , on January 14, 2021 by xi'an

[My Bristol friends and co-authors] Christophe Andrieu, and Anthony Lee, along with Sam Livingstone arXived a massive paper on 01 January on the Metropolis-Hastings kernel.

“Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features (…) This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.”

A central notion in this highly general framework is, extending Tierney (1998), to see an MCMC kernel as a triplet involving a probability measure μ (on an extended space), an involution transform φ generalising the proposal step (i.e. þ²=id), and an associated acceptance probability ð. Then μ-reversibility occurs for

$\eth(\xi)\mu(\text{d}\xi)= \eth(\phi(\xi))\mu^{\phi}(\text{d}\xi)$

with the rhs involving the push-forward measure induced by μ and φ. And furthermore there is always a choice of an acceptance probability ð ensuring for this equality to happen. Interestingly, the new framework allows for mostly seamless handling of more complex versions of MCMC such as reversible jump and parallel tempering. But also non-reversible kernels, incl. for instance delayed rejection. And HMC, incl. NUTS. And pseudo-marginal, multiple-try, PDMPs, &c., &c. it is remarkable to see such a general theory emerging a this (late?) stage of the evolution of the field (and I will need more time and attention to understand its consequences).

## away from CIRM

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on November 5, 2020 by xi'an

Due to the new lockdown measures enforced in France and in particular in Marseilles, the CIRM workshop on QMC and randomness has turned virtual, and I will thus give my talk on Coordinate sampler : A non-reversible Gibbs-like sampler from Paris. Rather than from the Luminy campus after an early morning run to the top of Mont Puget as we used to do on the previous workshop there. With versions of PDMP running on QMC (which makes sense when considering the deterministic component of the sampler).

Posted in Books, Mountains, pictures, Statistics with tags , , , , , , , , , , , , , on April 6, 2020 by xi'an

“…expensive in these terms, as for each root, Λ(x(s),v) (at the cost of one epoch) has to be evaluated for each root finding iteration, for each node of the numerical integral

When using the ZigZag sampler, the main (?) difficulty is in producing velocity switch as the switches are produced as interarrival times of an inhomogeneous Poisson process. When the rate of this process cannot be integrated out in an analytical manner, the only generic approach I know is in using Poisson thinning, obtained by finding an integrable upper bound on this rate, generating from this new process and subsampling. Finding the bound is however far from straightforward and may anyway result in an inefficient sampler. This new paper by Simon Cotter, Thomas House and Filippo Pagani makes several proposals to simplify this simulation, Nuzz standing for numerical ZigZag. Even better (!), their approach is based on what they call the Sellke construction, with Tom Sellke being a probabilist and statistician at Purdue University (trivia: whom I met when spending a postdoctoral year there in 1987-1988) who also wrote a fundamental paper on the opposition between Bayes factors and p-values with Jim Berger.

“We chose as a measure of algorithm performance the largest Kolmogorov-Smirnov (KS) distance between the MCMC sample and true distribution amongst all the marginal distributions.”

The practical trick is rather straightforward in that it sums up as the exponentiation of the inverse cdf method, completed with a numerical resolution of the inversion. Based on the QAGS (Quadrature Adaptive Gauss-Kronrod Singularities) integration routine. In order to save time Kingman’s superposition trick only requires one inversion rather than d, the dimension of the variable of interest. This nuzzled version of ZIgZag can furthermore be interpreted as a PDMP per se. Except that it retains a numerical error, whose impact on convergence is analysed in the paper. In terms of Wasserstein distance between the invariant measures. The paper concludes with a numerical comparison between Nuzz and random walk Metropolis-Hastings, HMC, and manifold MALA, using the number of evaluations of the likelihood as a measure of time requirement. Tuning for Nuzz is described, but not for the competition. Rather dramatically the Nuzz algorithm performs worse than this competition when counting one epoch for each likelihood computation and better when counting one epoch for each integral inversion. Which amounts to perfect inversion, unsurprisingly. As a final remark, all models are more or less Normal, with very smooth level sets, maybe not an ideal range

## coordinate sampler on-line

Posted in Statistics with tags , , , , , , on March 13, 2020 by xi'an