## estimating constants [survey]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on February 2, 2017 by xi'an

A new survey on Bayesian inference with intractable normalising constants was posted on arXiv yesterday by Jaewoo Park and Murali Haran. A rather massive work of 58 pages, almost handy for a short course on the topic! In particular, it goes through the most common MCMC methods with a detailed description, followed by comments on components to be calibrated and the potential theoretical backup. This includes for instance the method of Liang et al. (2016) that I reviewed a few months ago. As well as the Wang-Landau technique we proposed with Yves Atchadé and Nicolas Lartillot. And the noisy MCMC of Alquier et al. (2016), also reviewed a few months ago. (The Russian Roulette solution is only mentioned very briefly as” computationally very expensive”. But still used in some illustrations. The whole area of pseudo-marginal MCMC is also missing from the picture.)

“…auxiliary variable approaches tend to be more efficient than likelihood approximation approaches, though efficiencies vary quite a bit…”

The authors distinguish between MCMC methods where the normalizing constant is approximated and those where it is omitted by an auxiliary representation. The survey also distinguishes between asymptotically exact and asymptotically inexact solutions. For instance, using a finite number of MCMC steps instead of the associated target results in an asymptotically inexact method. The question that remains open is what to do with the output, i.e., whether or not there is a way to correct for this error. In the illustration for the Ising model, the double Metropolis-Hastings version of Liang et al. (2010) achieves for instance massive computational gains, but also exhibits a persistent bias that would go undetected were it the sole method implemented. This aspect of approximate inference is not really explored in the paper, but constitutes a major issue for modern statistics (and machine learning as well, when inference is taken into account.)

In conclusion, this survey provides a serious exploration of recent MCMC methods. It begs for a second part involving particle filters, which have often proven to be faster and more efficient than MCMC methods, at least in state space models. In that regard, Nicolas Chopin and James Ridgway examined further techniques when calling to leave the Pima Indians [dataset] alone.

## anytime!

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , on December 22, 2016 by xi'an

“An anytime algorithm is an algorithm that can be run continuously, generating progressively better solutions when afforded additional computation time. Traditional particle-based inference algorithms are not anytime in nature; all particles need to be propagated in lock-step to completion in order to compute expectations.”

Following a discussion with Lawrence Murray last week, I read Paige et al.  NIPS 2014 paper on their anytime sequential Monte Carlo algorithm. As explained above, an anytime algorithm is interruptible, meaning it can be stopped at any time without biasing the outcome of the algorithm. While MCMC algorithms can qualify as anytime (provided they are in stationary regime), it is not the case with sequential and particle Monte Carlo algorithms, which do not have an inbred growing mechanism preserving the target. In the case of Paige et al.’s proposal, the interruptible solution returns an unbiased estimator of the marginal likelihood at time n for any number of particles, even when this number is set or increased during the computation. The idea behind the solution is to create a particle cascade by going one particle at a time and creating children of this particle in proportion to the current average weight. An approach that can be run indefinitely. And since memory is not infinite, the authors explain how to cap the number of alive particles without putting the running distribution in jeopardy…

## estimating constants [impression soleil levant]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 25, 2016 by xi'an

The CRiSM workshop on estimating constants which took place here in Warwick from April 20 till April 22 was quite enjoyable [says most objectively one of the organisers!], with all speakers present to deliver their talks  (!) and around sixty participants, including 17 posters. It remains a exciting aspect of the field that so many and so different perspectives are available on the “doubly intractable” problem of estimating a normalising constant. Several talks and posters concentrated on Ising models, which always sound a bit artificial to me, but also are perfect testing grounds for approximations to classical algorithms.

On top of [clearly interesting!] talks associated with papers I had already read [and commented here], I had not previously heard about Pierre Jacob’s coupling SMC sequence, which paper is not yet out [no spoiler then!]. Or about Michael Betancourt’s adiabatic Monte Carlo and its connection with the normalising constant. Nicolas Chopin talked about the unnormalised Poisson process I discussed a while ago, with this feature that the normalising constant itself becomes an additional parameter. And that integration can be replaced with (likelihood) maximisation. The approach, which is based on a reference distribution (and an artificial logistic regression à la Geyer), reminded me of bridge sampling. And indirectly of path sampling, esp. when Merrilee Hurn gave us a very cool introduction to power posteriors in the following talk. Also mentioning the controlled thermodynamic integration of Chris Oates and co-authors I discussed a while ago. (Too bad that Chris Oates could not make it to this workshop!) And also pointing out that thermodynamic integration could be a feasible alternative to nested sampling.

Another novel aspect was found in Yves Atchadé’s talk about sparse high-dimension matrices with priors made of mutually exclusive measures and quasi-likelihood approximations. A simplified version of the talk being in having a non-identified non-constrained matrix later projected onto one of those measure supports. While I was aware of his noise-contrastive estimation of normalising constants, I had not previously heard Michael Gutmann give a talk on that approach (linking to Geyer’s 1994 mythical paper!). And I do remain nonplussed at the possibility of including the normalising constant as an additional parameter [in a computational and statistical sense]..! Both Chris Sherlock and Christophe Andrieu talked about novel aspects on pseudo-marginal techniques, Chris on the lack of variance reduction brought by averaging unbiased estimators of the likelihood and Christophe on the case of large datasets, recovering better performances in latent variable models by estimating the ratio rather than taking a ratio of estimators. (With Christophe pointing out that this was an exceptional case when harmonic mean estimators could be considered!)

## SPA 2015 Oxford [my day #2]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on July 17, 2015 by xi'an

Today I [barely made it on a delayed train from Leaminton Spa to Oxford as I] chaired my invited session at SPA 2015 on advanced MCMC methodology. The three speakers, Randal Douc, Mike Pitt and Matti Vihola, all gave talks related to the pseudo-marginal technique. For instance, Randal gave examples of guaranteed variance improvements by adding randomisation steps in the generation of the rv’s behind the unbiased estimation of the likelihood function. Mike Pitt presented the paper I discussed a little while ago about evaluating the computing performances of pseudo-marginal approximations, with a fairly compelling perspective [I may have missed from the paper] on approximating the distribution on the approximation to the log-likelihood as a normal. Which led me to ponder at the ultimate version where the log-likelihood itself would get directly simulated in an MCMC algorithm bypassing the preliminary simulation of the parameters. Sounds a bit too fantasy-like to be of any use… Matti Vihola also presented recent results with Christophe Andrieu on comparing pseudo-marginal approximations, based on convex ordering properties. They included a domination result on ABC-MCM algorithms, as noted in a recent post. Which made me musing about the overall importance of unbiasedness in the global picture, where all we need are converging approximations, in fine.

## Stochastic volatility filtering with intractable likelihoods

Posted in Books, Statistics, University life with tags , , , , , , on May 23, 2014 by xi'an

“The contribution of our work is two-fold: first, we extend the SVM literature, by proposing a new method for obtaining the filtered volatility estimates. Second, we build upon the current ABC literature by introducing the ABC auxiliary particle filter, which can be easily applied not only to SVM, but to any hidden Markov model.”

Another ABC arXival: Emilian Vankov and Katherine B. Ensor posted a paper with the above title. They consider a stochastic volatility model with an α-stable distribution on the observables (or returns). Which makes the likelihood unavailable, even were the hidden Markov sequence known… Now, I find very surprising that the authors do not mention the highly relevant paper of Peters, Sisson and Fan, Likelihood-free Bayesian inference for α-stable models, published in CSDA, in 2012, where an ABC algorithm is specifically designed for handling α-stable likelihoods. (Commented on that earlier post.) Similarly, the use of a particle filter coupled to ABC seems to be advanced as a novelty when many researchers have implemented such filters, including Pierre Del Moral, Arnaud Doucet, Ajay Jasra, Sumeet Singh and others, in similar or more general settings. Furthermore, Simon Barthelmé and Nicolas Chopin analysed this very model by EP-ABC and ABC.  I thus find it a wee bit hard to pinpoint the degree of innovation contained in this new ABC paper

## MCqMC 2014 [day #1]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on April 9, 2014 by xi'an

As I have been kindly invited to give a talk at MCqMC 2014, here am I. in Leuven, Belgium, for this conference I have never attended before. (I was also invited for MCqMC 2012 in Sydney The talk topics and the attendees’ “sociology” are quite similar to those of the IMACS meeting in Annecy last summer. Namely, rather little on MCMC, particle filters, and other tools familiar in Bayesian computational statistics, but a lot on diffusions and stochastic differential equations and of course quasi-Monte Carlo methods. I thus find myself at a boundary of the conference range and a wee bit lost by some talks, which even titles make little sense to me.

For instance, I have trouble to connect with multi-level Monte Carlo within my own referential. My understanding of the method is one of a control variate version of tempering, namely of using a sequence of approximations to the true target and using rougher approximations as control variates for the finer approximations. But I cannot find on the Web a statistical application of the method outside of diffusions and SDEs, i.e. outside of continuous time processes… Maybe using a particle filter from one approximation to the next, down in terms of roughness, could help.

“Several years ago, Giles (2008) introduced an intriguing multi-level idea to deal with such biased settings that can dramatically improve the rate of convergence and can even, in some settings, achieve the canonical “square root” convergence rate associated with unbiased Monte Carlo.” Rhee and Glynn, 2012

Those were my thoughts before lunchtime. today (namely April 7, 2014). And then, after lunch, Peter Glynn gave his plenary talk that just answered those questions of mine’s!!! Essentially, he showed that formula Pierre Jacob also used in his Bernoulli factory paper to transform a converging-biased-into-an-unbiased estimator, based on a telescopic series representation and a random truncation… This approach is described in a paper with Chang-han Rhee, arXived a few years ago. The talk also covered more recent work (presumably related with Chang-han Rhee’s thesis) extending the above to Markov chains. As explained to me later by Pierre Jacob [of Statisfaction fame!], a regular chain does not converge fast enough to compensate for the explosive behaviour of the correction factor, which is why Rhee and Glynn used instead a backward chain, linking to the exact or perfect samplers of the 1990’s (which origin can be related to a 1992 paper of Asmussen, Glynn and Thorisson). This was certainly the most riveting talk I attended in the past years in that it brought a direct answer to a question I was starting to investigate. And more. I was also wondering how connected it was with our “exact” representation of the stationary distribution (in an Annals of Probability paper with Jim Hobert).   Since we use a stopping rule based on renewal and a geometric waiting time, a somewhat empirical version of the inverse probability found in Peter’s talk. This talk also led me to re-consider a recent discussion we had in my CREST office with Andrew about using square root(ed) importance weights, since one of Peter’s slides exhibited those square roots as optimal. Paradoxically, Peter started the talk by down-playing it, stating there was a single idea therein and a single important slide, making it a perfect after-lunch talk: I wish I had actually had thrice more time to examine each slide! (In the afternoon session, Éric Moulines also gave a thought-provoking talk on particle islands and double bootstrap, a research project I will comment in more detail the day it gets arXived.)

## Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!