## high-dimensional stochastic simulation and optimisation in image processing [day #3]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on August 31, 2014 by xi'an

Last and maybe most exciting day of the “High-dimensional Stochastic Simulation and Optimisation in Image Processing” in Bristol as it was exclusively about simulation (MCMC) methods. Except my own talk on ABC. And Peter Green’s on consistency of Bayesian inference in non-regular models. The talks today were indeed about using convex optimisation devices to speed up MCMC algorithms with tools that were entirely new to me, like the Moreau transform discussed by Marcelo Pereyra. Or using auxiliary variables à la RJMCMC to bypass expensive Choleski decompositions. Or optimisation steps from one dual space to the original space for the same reason. Or using pseudo-gradients on partly differentiable functions in the talk by Sylvain Lecorff on a paper commented earlier in the ‘Og. I particularly liked the notion of Moreau regularisation that leads to more efficient Langevin algorithms when the target is not regular enough. Actually, the discretised diffusion itself may be geometrically ergodic without the corrective step of the Metropolis-Hastings acceptance. This obviously begs the question of an extension to Hamiltonian Monte Carlo. And to multimodal targets, possibly requiring as many normalisation factors as there are modes. So, in fine, a highly informative workshop, with the perfect size and the perfect crowd (which happened to be predominantly French, albeit from a community I did not have the opportunity to practice previously). Massive kudos to Marcello for putting this workshop together, esp. on a week where family major happy events should have kept him at home!

As the workshop ended up in mid-afternoon, I had plenty of time for a long run with Florence Forbes down to the Avon river and back up among the deers of Ashton Court, avoiding most of the rain, all of the mountain bikes on a bike trail that sounded like trail running practice, and building enough of an appetite for the South Indian cooking of the nearby Thali Café. Brilliant!

## high-dimensional stochastic simulation and optimisation in image processing [day #1]

Posted in pictures, Statistics, Travel, Uncategorized, University life, Wines with tags , , , , , , , , , , , on August 29, 2014 by xi'an

Even though I flew through Birmingham (and had to endure the fundamental randomness of trains in Britain), I managed to reach the “High-dimensional Stochastic Simulation and Optimisation in Image Processing” conference location (in Goldney Hall Orangery) in due time to attend the (second) talk by Christophe Andrieu. He started with an explanation of the notion of controlled Markov chain, which reminded me of our early and famous-if-unpublished paper on controlled MCMC. (The label “controlled” was inspired by Peter Green who pointed out to us the different meanings of controlled in French [meaning checked or monitored] and in English . We use it here in the English sense, obviously.) The main focus of the talk was on the stability of controlled Markov chains. With of course connections with out controlled MCMC of old, for instance the case of the coerced acceptance probability. Which happened to be not that stable! With the central tool being Lyapounov functions. (Making me wonder whether or not it would make sense to envision the meta-problem of adaptively estimating the adequate Lyapounov function from the MCMC outcome.)

As I had difficulties following the details of the convex optimisation talks in the afternoon, I eloped to work on my own and returned to the posters & wine session, where the small number of posters allowed for the proper amount of interaction with the speakers! Talking about the relevance of variational Bayes approximations and of possible tools to assess it, about the use of new metrics for MALA and of possible extensions to Hamiltonian Monte Carlo, about Bayesian modellings of fMRI and of possible applications of ABC in this framework. (No memorable wine to make the ‘Og!) Then a quick if reasonably hot curry and it was already bed-time after a rather long and well-filled day!z

## Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

## Sean Meyn in Paris

Posted in Books, Statistics, Travel with tags , , , , , , , on November 23, 2013 by xi'an

My friend Sean Meyn (from the University of Florida, Gainesville) will give a talk in Paris next week (and I will be away in Coventry at the time…). Here are the details:

Mardi 26 novembre 2013 à 14h00
Salle de Conseil, 4ème étage (LINCS) 23 AVENUE D’ITALIE 75013 PARIS

Titre de l’exposé : Feature Selection for Neuro-Dynamic Programming

Neuro-Dynamic Programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that defines the function class that is required in the application of these techniques. This talk reviews two popular approaches to neuro-dynamic programming, TD-learning and Q-learning. The main goal of this work is to demonstrate how insight from idealized models can be used as a guide for feature selection for these algorithms. Several approaches are surveyed, including fluid and diffusion models, and the application of idealized models arising from mean-field game approximations. The theory is illustrated with several examples.

## snapshot from Budapest (EMS 2013 #3)

Posted in pictures, Statistics, University life, Wines with tags , , , , , on July 25, 2013 by xi'an

Here is the new version of the talk:

And I had a fairly interesting day at the conference, from Randal’s talk on hidden Markov models with finite valued observables to the two Terrys invited session (Terry Lyons vs. Terry Speed) to the machine learning session organised by a certain Michal Jordan (on the program) that turned out to be Michael Jordan (with a talk on the fusion between statistics and computability). A post-session chat with Terry Lyons also opened (to me) new perspectives on data summarisation. (And we at last managed to get a convergence result using a Rao-Blackwellisation argument!) Plus, we ended up the day in a nice bistrot called Zeller with an awfully friendly staff cooking family produces and serving fruity family wines and not yet spoiled by being ranked #1 on tripadvisor (but visibly attracting a lot of tourists like us).

Posted in Statistics with tags , , , , , , , on December 21, 2012 by xi'an

Today my student Xiaolin Cheng presented the mythical 1990 JASA paper of Alan Gelfand and Adrian Smith, Sampling-based approaches to calculating marginal densities. The very one that started the MCMC revolution of the 1990’s! Re-reading it through his eyes was quite enlightening, even though he stuck quite closely to the paper. (To the point of not running his own simulation, nor even reporting Gelfand and Smith’s, as shown by the slides below. This would have helped, I think…)

Indeed, those slides focus very much on the idea that such substitution samplers can provide parametric approximations to the marginal densities of the components of the simulated parameters. To the point of resorting to importance sampling as an alternative to the standard Rao-Blackwell estimate, a solution that did not survive long. (We briefly discussed this point during the seminar, as the importance function was itself based on a Rao-Blackwell estimate, with possibly tail issues. Gelfand and Smith actually conclude on the higher efficiency of the Gibbs sampler.) Maybe not so surprisingly, the approximation of the “other” marginal, namely the marginal likelihood, as it is much more involved (and would lead to the introduction of the infamous harmonic mean estimator a few years later! And Chib’s (1995), which is very close in spirit to the Gibbs sampler). While Xiaolin never mentioned Markov chains in his talk, Gelfand and Smith only report that Gibbs sampling is a Markovian scheme, and refer to both Geman and Geman (1984) and Tanner and Wong (1987), for convergence issues. Rather than directly invoking Markov arguments as in Tierney (1994) and others. A fact that I find quite interesting, a posteriori, as it highlights the strong impact Meyn and Tweedie would have, three years later.

## lemma 7.3

Posted in Statistics with tags , , , , , , , , , , , on November 14, 2012 by xi'an

As Xiao-Li Meng accepted to review—and I am quite grateful he managed to fit this review in an already overflowing deanesque schedule!— our 2004 book  Monte Carlo Statistical Methods as part of a special book review issue of CHANCE honouring the memory of George thru his books—thanks to Sam Behseta for suggesting this!—, he sent me the following email about one of our proofs—demonstrating how much efforts he had put into this review!—:

I however have a question about the proof of Lemma 7.3
on page 273. After the expression of
E[h(x^(1)|x_0], the proof stated "and substitute
Eh(x) for h(x_1)".  I cannot think of any
justification for this substitution, given the whole
purpose is to show h(x) is a constant.

I put it on hold for a while and only looked at it in the (long) flight to Chicago. Lemma 7.3 in Monte Carlo Statistical Methods is the result that the Metropolis-Hastings algorithm is Harris recurrent (and not only recurrent). The proof is based on the characterisation of Harris recurrence as having only constants for harmonic functions, i.e. those satisfying the identity

$h(x) = \mathbb{E}[h(X_t)|X_{t-1}=x]$

The chain being recurrent, the above implies that harmonic functions are almost everywhere constant and the proof steps from almost everywhere to everywhere. The fact that the substitution above—and I also stumbled upon that very subtlety when re-reading the proof in my plane seat!—is valid is due to the fact that it occurs within an integral: despite sounding like using the result to prove the result, the argument is thus valid! Needless to say, we did not invent this (elegant) proof but took it from one of the early works on the theory of Metropolis-Hastings algorithms, presumably Luke Tierney’s foundational Annals paper work that we should have quoted…

As pointed out by Xiao-Li, the proof is also confusing for the use of two notations for the expectation (one of which is indexed by f and the other corresponding to the Markov transition) and for the change in the meaning of f, now the stationary density, when compared with Theorem 6.80.