Archive for University of Oxford

MCqMC 2020 live and free and online

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , on July 27, 2020 by xi'an

The MCqMC 20202 conference that was supposed to take place in Oxford next 9-14 August has been turned into an on-line free conference since travelling remains a challenge for most of us. Tutorials and plenaries will be live with questions  on Zoom, with live-streaming and recorded copies on YouTube. They will probably be during 14:00-17:00 UK time (GMT+1),  15:00-18:00 CET (GMT+2), and 9:00-12:00 ET. (Which will prove a wee bit of a challenge for West Coast and most of Asia and Australasia researchers, which is why our One World IMS-Bernoulli conference we asked plenary speakers to duplicate their talks.) All other talks will be pre-recorded by contributors and uploaded to a website, with an online Q&A discussion section for each. As a reminder here are the tutorials and plenaries:

Invited plenary speakers:

Aguêmon Yves Atchadé (Boston University)
Jing Dong (Columbia University)
Pierre L’Écuyer (Université de Montréal)
Mark Jerrum (Queen Mary University London)
Peter Kritzer (RICAM Linz)
Thomas Muller (NVIDIA)
David Pfau (Google DeepMind)
Claudia Schillings (University of Mannheim)
Mario Ullrich (JKU Linz)

Tutorials:

Fred Hickernell (IIT) — Software for Quasi-Monte Carlo Methods
Aretha Teckentrup (Edinburgh) — Markov chain Monte Carlo methods

non-reversible jump MCMC

Posted in Books, pictures, Statistics with tags , , , , , , , on June 29, 2020 by xi'an

Philippe Gagnon and et Arnaud Doucet have recently arXived a paper on a non-reversible version of reversible jump MCMC, the methodology introduced by Peter Green in 1995 to tackle Bayesian model choice/comparison/exploration. Whom Philippe presented at BayesComp20.

“The objective of this paper is to propose sampling schemes which do not suffer from such a diffusive behaviour by exploiting the lifting idea (…)”

The idea is related to lifting, creating non-reversible behaviour by adding a direction index (a spin) to the exploration of the models, assumed to be totally ordered, as with nested models (mixtures, changepoints, &tc.).  As with earlier versions of lifting, the chain proceeds along one (spin) direction until the proposal is rejected in which case the spin spins. The acceptance probability in the event of a change of model (upwards or downwards) is essentially the same as the reversible one (meaning it includes the dreaded Jacobian!). The original difficulty with reversible jump remains active with non-reversible jump in that the move from one model to the next must produce plausible values. The paper recalls two methods proposed by Christophe Andrieu and his co-authors. One consists in buffering a tempering sequence, but this proves costly.  Pursuing the interesting underlying theme that both reversible and non-reversible versions are noisy approximations of the marginal ratio, the other one consists in marginalising out the parameter to approximate the marginal probability of moving between nearby models. Combined with multiple choice to preserve stationarity and select more likely moves at the same time. Still requiring a multiplication of the number of simulations but parallelisable. The paper contains an exact comparison result that non-reversible jump leads to a smaller asymptotic variance than reversible jump, but it is unclear to me whether or not this accounts for the extra computing time resulting from the multiple paths in the proposed algorithms. (Even though the numerical illustration shows an improvement brought by the non-reversible side for the same computational budget.)

scalable Metropolis-Hastings, nested Monte Carlo, and normalising flows

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on June 16, 2020 by xi'an

Over a sunny if quarantined Sunday, I started reading the PhD dissertation of Rob Cornish, Oxford University, as I am the external member of his viva committee. Ending up in a highly pleasant afternoon discussing this thesis over a (remote) viva yesterday. (If bemoaning a lost opportunity to visit Oxford!) The introduction to the viva was most helpful and set the results within the different time and geographical zones of the Ph.D since Rob had to switch from one group of advisors in Engineering to another group in Statistics. Plus an encompassing prospective discussion, expressing pessimism at exact MCMC for complex models and looking forward further advances in probabilistic programming.

Made of three papers, the thesis includes this ICML 2019 [remember the era when there were conferences?!] paper on scalable Metropolis-Hastings, by Rob Cornish, Paul Vanetti, Alexandre Bouchard-Côté, Georges Deligiannidis, and Arnaud Doucet, which I commented last year. Which achieves a remarkable and paradoxical O(1/√n) cost per iteration, provided (global) lower bounds are found on the (local) Metropolis-Hastings acceptance probabilities since they allow for Poisson thinning à la Devroye (1986) and  second order Taylor expansions constructed for all components of the target, with the third order derivatives providing bounds. However, the variability of the acceptance probability gets higher, which induces a longer but still manageable if the concentration of the posterior is in tune with the Bernstein von Mises asymptotics. I had not paid enough attention in my first read at the strong theoretical justification for the method, relying on the convergence of MAP estimates in well- and (some) mis-specified settings. Now, I would have liked to see the paper dealing with a more complex problem that logistic regression.

The second paper in the thesis is an ICML 2018 proceeding by Tom Rainforth, Robert Cornish, Hongseok Yang, Andrew Warrington, and Frank Wood, which considers Monte Carlo problems involving several nested expectations in a non-linear manner, meaning that (a) several levels of Monte Carlo approximations are required, with associated asymptotics, and (b) the resulting overall estimator is biased. This includes common doubly intractable posteriors, obviously, as well as (Bayesian) design and control problems. [And it has nothing to do with nested sampling.] The resolution chosen by the authors is strictly plug-in, in that they replace each level in the nesting with a Monte Carlo substitute and do not attempt to reduce the bias. Which means a wide range of solutions (other than the plug-in one) could have been investigated, including bootstrap maybe. For instance, Bayesian design is presented as an application of the approach, but since it relies on the log-evidence, there exist several versions for estimating (unbiasedly) this log-evidence. Similarly, the Forsythe-von Neumann technique applies to arbitrary transforms of a primary integral. The central discussion dwells on the optimal choice of the volume of simulations at each level, optimal in terms of asymptotic MSE. Or rather asymptotic bound on the MSE. The interesting result being that the outer expectation requires the square of the number of simulations for the other expectations. Which all need converge to infinity. A trick in finding an estimator for a polynomial transform reminded me of the SAME algorithm in that it duplicated the simulations as many times as the highest power of the polynomial. (The ‘Og briefly reported on this paper… four years ago.)

The third and last part of the thesis is a proposal [to appear in ICML 20] on relaxing bijectivity constraints in normalising flows with continuously index flows. (Or CIF. As Rob made a joke about this cleaning brand, let me add (?) to that joke by mentioning that looking at CIF and bijections is less dangerous in a Trump cum COVID era at CIF and injections!) With Anthony Caterini, George Deligiannidis and Arnaud Doucet as co-authors. I am much less familiar with this area and hence a wee bit puzzled at the purpose of removing what I understand to be an appealing side of normalising flows, namely to produce a manageable representation of density functions as a combination of bijective and differentiable functions of a baseline random vector, like a standard Normal vector. The argument made in the paper is that imposing this representation of the density imposes a constraint on the topology of its support since said support is homeomorphic to the support of the baseline random vector. While the supporting theoretical argument is a mathematical theorem that shows the Lipschitz bound on the transform should be infinity in the case the supports are topologically different, these arguments may be overly theoretical when faced with the practical implications of the replacement strategy. I somewhat miss its overall strength given that the whole point seems to be in approximating a density function, based on a finite sample.

Judith’s colloquium at Warwick

Posted in Statistics with tags , , , , , , , , on February 21, 2020 by xi'an

MCqMC2020 key dates

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on January 23, 2020 by xi'an

A reminder of the key dates for the incoming MCqMC2020 conference this summer in Oxford:

Feb 28, Special sessions/minisymposia submission
Mar 13, Contributed abstracts submission
Mar 27, Acceptance notification
Mar 27, Registration starts
May 8, End of early bird registration
June 12, Speaker registration deadline
Aug 9-14 Conference

and of the list of plenary speakers

Yves Atchadé (Boston University)
Jing Dong (Columbia University)
Pierre L’Ecuyer (Université de Montreal)
Mark Jerrum (Queen Mary University London)
Gerhard Larcher (JKU Linz)
Thomas Muller (NVIDIA)
David Pfau (Google DeepMind)
Claudia Schillings (University of Mannheim)
Mario Ullrich (JKU Linz)