non-reversible Langevin samplers

In the train to Oxford yesterday night, I read through the recently arXived Duncan et al.’s Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation. Standing up the whole trip in the great tradition of British trains.

The paper is fairly theoretical and full of Foster-Lyapunov assumptions but aims at defending an approach based on a non-reversible diffusion. One idea is that the diffusion based on the drift {∇ log π(x) + γ(x)} is associated with the target π provided

∇ . {π(x)γ(x)} = 0

which holds for the Langevin diffusion when γ(x)=0, but produces a non-reversible process in the alternative. The Langevin choice γ(x)=0 happens to be the worst possible when considering the asymptotic variance. In practice however the diffusion need be discretised, which induces an approximation that may be catastrophic for convergence if not corrected, and a relapse into reversibility if corrected by Metropolis. The proposal in the paper is to use a Lie-Trotter splitting I had never heard of before to split between reversible [∇ log π(x)] and non-reversible [γ(x)] parts of the process. The deterministic part is chosen as γ(x)=∇ log π(x) [but then what is the point since this is Langevin?] or as the gradient of a power of π(x). Although I was mostly lost by that stage, the paper then considers the error induced by a numerical integrator related with this deterministic part, towards deriving asymptotic mean and variance for the splitting scheme. On the unit hypercube. Although the paper includes a numerical example for the warped normal target, I find it hard to visualise the implementation of this scheme. Having obviously not heeded Nicolas’ and James’ advice, the authors also analyse the Pima Indian dataset by a logistic regression!)

As in the previous years, I am back in Oxford (England) for my short Bayesian Statistics course in the joint Oxford-Warwick PhD programme, OxWaSP.  For some unclear reason, presumably related to the Internet connection from Oxford, I have not been able to upload my slides to Slideshare, so here the [99.9% identical] older version:

[Here is a call for a two-year postdoc in Oxford sent to me by Arnaud Doucet. For those worried about moving to Britain, I think that, given the current pace—or lack thereof—of the negotiations with the EU, it is very likely that Britain will not have Brexited two years from now.]

Numerous medical problems ranging from screening to diagnosis to treatment of chronic diseases to  management of care in hospitals requires the development of novel statistical models and methods. These models and methods need to address the unique characteristics of medical data such as sampling bias, heterogeneity, non-stationarity, informative censoring etc. Existing state-of-the-art machine learning and statistics techniques often fail to exploit those characteristics. Additionally, the focus needs to be on probabilistic models which are
interpretable by the clinicians so that the inference results can be integrated within the medical-decision making.

We have access to unique datasets for clinical deterioration of patients in the hospital, for cancer screening, and for treatment of chronic diseases. Preliminary work has been tested and implemented at UCLA Medical Center, resulting in significantly management care in this hospital.

The successful applicant will be expected to develop new probabilistic models and learning methods inspired by these applications. The focus will be primarily on methodological and theoretical developments, and involve collaborating with Oxford researchers in machine learning, computational statistics and medicine to bring these developments to practice.

The post-doctoral researcher will be jointly supervised by Prof. Mihaela van der Schaar and Prof. Arnaud Doucet. Both of them have a strong track-record in advising PhD students and post-doctoral researchers who subsequently became successful academics in statistics, engineering sciences, computer science and economics. The position is for 2 years.

As I was unsure of the Internet connections and of the more than likely delays I would face during my trip to India, I went fishing for a massive novel on Amazon and eventually ordered Peter Hamilton’s Great North Road, a 1088 pages behemoth! I fear the book qualifies as space opera, with the conventional load of planet invasions, incomprehensible and infinitely wise aliens, gateways for instantaneous space travels, and sentient biospheres. But the core of the story is very, very, Earth-bound, with a detective story taking place in a future Newcastle that is not so distant from now in many ways. (Or even from the past as the 2012 book did not forecast Brexit…) With an occurrence of the town moor where I went running a few years ago.

The book is mostly well-designed, with a plot gripping enough to keep me hooked for Indian evenings in Kolkata and most of the flight back. I actually finished it just before landing in Paris. There is no true depth in the story, though, and the science fiction part is rather lame: a very long part of the detective plot is spent on the hunt for a taxi by an army of detectives, a task one would think should be delegated to a machine-learning algorithm and solved in a nano-second or so. The themes heavily borrow from those of classics like Avatar, Speaker for the Dead, Hyperion [very much Hyperion!], Alien… And from The Girl with the Dragon Tattoo for an hardcore heroin who is perfect at anything she undertakes.  Furthermore, the Earth at the centre of this extended universe is very close to its present version, with English style taxis, pub culture, and a geopolitic structure of the World pretty much unchanged. Plus main brands identical to currents ones (Apple, BMW, &tc), to the point it sounds like sponsored links! And no clue of a major climate change despite the continued use of fuel engines. Nonetheless, an easy read when stuck in an airport or a plane seat for several hours.

Here are the slides of the presentation I gave at the EPSRC Advanced Computational methods for complex models in Biology at University College London, last week. Introducing random forests as proper summaries for both model choice and parameter estimation (with considerable overlap with earlier slides, obviously!). The other talks of that highly interesting day on computational Biology were mostly about ancestral graphs, using Wright-Fisher diffusions for coalescents, plus a comparison of expectation-propagation and ABC on a genealogy model by Mark Beaumont and the decision theoretic approach to HMM order estimation by Chris Holmes. In addition, it gave me the opportunity to come back to the Department of Statistics at UCL more than twenty years after my previous visit, at a time when my friend Costas Goutis was still there. And to realise it had moved from its historical premises years ago. (I wonder what happened to the two staircases built to reduce frictions between Fisher and Pearson if I remember correctly…)