Archive for partly deterministic processes

the buzz about nuzz

Posted in Books, Mountains, pictures, Statistics with tags , , , , , , , , , , , , , on April 6, 2020 by xi'an

“…expensive in these terms, as for each root, Λ(x(s),v) (at the cost of one epoch) has to be evaluated for each root finding iteration, for each node of the numerical integral

When using the ZigZag sampler, the main (?) difficulty is in producing velocity switch as the switches are produced as interarrival times of an inhomogeneous Poisson process. When the rate of this process cannot be integrated out in an analytical manner, the only generic approach I know is in using Poisson thinning, obtained by finding an integrable upper bound on this rate, generating from this new process and subsampling. Finding the bound is however far from straightforward and may anyway result in an inefficient sampler. This new paper by Simon Cotter, Thomas House and Filippo Pagani makes several proposals to simplify this simulation, Nuzz standing for numerical ZigZag. Even better (!), their approach is based on what they call the Sellke construction, with Tom Sellke being a probabilist and statistician at Purdue University (trivia: whom I met when spending a postdoctoral year there in 1987-1988) who also wrote a fundamental paper on the opposition between Bayes factors and p-values with Jim Berger.

“We chose as a measure of algorithm performance the largest Kolmogorov-Smirnov (KS) distance between the MCMC sample and true distribution amongst all the marginal distributions.”

The practical trick is rather straightforward in that it sums up as the exponentiation of the inverse cdf method, completed with a numerical resolution of the inversion. Based on the QAGS (Quadrature Adaptive Gauss-Kronrod Singularities) integration routine. In order to save time Kingman’s superposition trick only requires one inversion rather than d, the dimension of the variable of interest. This nuzzled version of ZIgZag can furthermore be interpreted as a PDMP per se. Except that it retains a numerical error, whose impact on convergence is analysed in the paper. In terms of Wasserstein distance between the invariant measures. The paper concludes with a numerical comparison between Nuzz and random walk Metropolis-Hastings, HMC, and manifold MALA, using the number of evaluations of the likelihood as a measure of time requirement. Tuning for Nuzz is described, but not for the competition. Rather dramatically the Nuzz algorithm performs worse than this competition when counting one epoch for each likelihood computation and better when counting one epoch for each integral inversion. Which amounts to perfect inversion, unsurprisingly. As a final remark, all models are more or less Normal, with very smooth level sets, maybe not an ideal range


IMS workshop [day 3]

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on August 30, 2018 by xi'an

I made the “capital” mistake of walking across the entire NUS campus this morning, which is quite green and pretty, but which almost enjoys an additional dimension brought by such an intense humidity that one feels having to get around this humidity!, a feature I have managed to completely erase from my memory of my previous visit there. Anyway, nothing of any relevance. oNE talk in the morning was by Markus Eisenbach on tools used by physicists to speed up Monte Carlo methods, like the Wang-Landau flat histogram, towards computing the partition function, or the distribution of the energy levels, definitely addressing issues close to my interest, but somewhat beyond my reach for using a different language and stress, as often in physics. (I mean, as often in physics talks I attend.) An idea that came out clear to me was to bypass a (flat) histogram target and aim directly at a constant slope cdf for the energy levels. (But got scared away by the Fourier transforms!)

Lawrence Murray then discussed some features of the Birch probabilistic programming language he is currently developing, especially a fairly fascinating concept of delayed sampling, which connects with locally-optimal proposals and Rao Blackwellisation. Which I plan to get back to later [and hopefully sooner than later!].

In the afternoon, Maria de Iorio gave a talk about the construction of nonparametric priors that create dependence between a sequence of functions, a notion I had not thought of before, with an array of possibilities when using the stick breaking construction of Dirichlet processes.

And Christophe Andrieu gave a very smooth and helpful entry to partly deterministic Markov processes (PDMP) in preparation for talks he is giving next week for the continuation of the workshop at IMS. Starting with the guided random walk of Gustafson (1998), which extended a bit later into the non-reversible paper of Diaconis, Holmes, and Neal (2000). Although I had a vague idea of the contents of these papers, the role of the velocity ν became much clearer. And premonitory of the advances made by the more recent PDMP proposals. There is obviously a continuation with the equally pedagogical talk Christophe gave at MCqMC in Rennes two months [and half the globe] ago,  but the focus being somewhat different, it really felt like a new talk [my short term memory may also play some role in this feeling!, as I now remember the discussion of Hilderbrand (2002) for non-reversible processes]. An introduction to the topic I would recommend to anyone interested in this new branch of Monte Carlo simulation! To be followed by the most recently arXived hypocoercivity paper by Christophe and co-authors.

Bouncing bouncy particle papers

Posted in Books, pictures, Statistics, University life with tags , , , , on July 27, 2017 by xi'an

Yesterday, two papers on bouncy particle samplers simultaneously appeared on arXiv, arxiv:1707.05200 by Chris Sherlock and Alex Thiery, and arxiv:1707.05296 by Paul Vanetti, Alexandre Bouchard-Côté, George Deligiannidis, and Arnaud Doucet. As a coordinated move by both groups of authors who had met the weeks before at the Isaac Newton Institute in Cambridge.

The paper by Sherlock and Thiery, entitled a discrete bouncy particle sampler, considers a delayed rejection approach that only requires point-wise evaluations of the target density. The delay being into making a speed flip move after a proposal involving a flip in the speed and a drift in the variable of interest is rejected. To achieve guaranteed ergodicity, they add a random perturbation as in our recent paper, plus another perturbation based on a Brownian argument. Given that this is a discretised version of the continuous-time bouncy particle sampler, the discretisation step δ need be calibrated. The authors follow a rather circumvoluted argument to argue in favour of seeking a maximum number of reflections (for which I have obviously no intuition). Overall, I find it hard to assess how much of an advance this is, even when simulations support the notion of a geometric convergence.

“Our results provide a cautionary example that in certain high-dimensional scenarios, it is still preferable to perform refreshment even when randomized bounces are used.” Vanetti et al.

The paper by Paul Vanetti and co-authors has a much more ambitious scale in that it unifies most of the work done so far in this area and relates piecewise deterministic processes, Hamiltonian Monte Carlo, and discrete versions, containing on top fine convergence results. The main idea is to improve upon the existing deterministic methods by taking (more) into account the target density. Hence the use of a bouncy particle sampler associated with the Hamiltonian (as in HMC). This borrows from an earlier slice sampler idea of Iain Murray, Ryan Adams, and David McKay (AISTATS 2010), exploiting an exact Hamiltonian dynamics for an approximation to the true target to explore its support. Except that bouncing somewhat avoids the slice step. The [eight] discrete bouncy particle particle samplers derived from this framework are both correct against the targeted distribution and do not require the simulation of event times. The paper distinguishes between global and local versions, the later exploiting conditional independence properties in the (augmented) target. Which sounds like a version of multiple slice sampling.