## not converging to London for an [extra]ordinary Read Paper

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , on November 21, 2014 by xi'an

On December 10, I will alas not travel to London to attend the Read Paper on sequential quasi-Monte Carlo presented by Mathieu Gerber and Nicolas Chopin to The Society, as I fly instead to Montréal for the NIPS workshops… I am quite sorry to miss this event, as this is a major paper which brings quasi-Monte Carlo methods into mainstream statistics. I will most certainly write a discussion and remind Og’s readers that contributed (800 words) discussions are welcome from everyone, the deadline for submission being January 02.

## Sequentially Constrained Monte Carlo

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , , , , , , on November 7, 2014 by xi'an

This newly arXived paper by S. Golchi and D. Campbell from Vancouver (hence the above picture) considers the (quite) interesting problem of simulating from a target distribution defined by a constraint. This is a question that have bothered me for a long while as I could not come up with a satisfactory solution all those years… Namely, when considering a hard constraint on a density, how can we find a sequence of targets that end up with the restricted density? This is of course connected with the zero measure case posted a few months ago. For instance, how do we efficiently simulate a sample from a Student’s t distribution with a fixed sample mean and a fixed sample variance?

“The key component of SMC is the filtering sequence of distributions through which the particles evolve towards the target distribution.” (p.3)

This is indeed the main issue! The paper considers using a sequence of intermediate targets hardening progressively the constraint(s), along with an SMC sampler, but this recommendation remains rather vague and hence I am at loss as to how to make it work when the exact constraint implies a change of measure. The first example is monotone regression where y has mean f(x) and f is monotone. (Everything is unidimensional here.) The sequence is then defined by adding a multiplicative term that is a function of ∂f/∂x, for instance

Φ(τ∂f/∂x),

with τ growing to infinity to make the constraint moving from soft to hard. An interesting introduction, even though the hard constraint does not imply a change of parameter space or of measure. The second example is about estimating the parameters of an ODE, with the constraint being the ODE being satisfied exactly. Again, not exactly what I was looking for. But with an exotic application to deaths from the 1666 Black (Death) plague.

And then the third example is about ABC and the choice of summary statistics! The sequence of constraints is designed to keep observed and simulated summary statistics close enough when the dimension of those summaries increases, which means they are considered simultaneously rather than jointly. (In the sense of Ratmann et al., 2009. That is, with a multidimensional distance.) The model used for the application of the SMC is the dynamic model of Wood (2010, Nature). The outcome of this specific implementation is not that clear compared with alternatives… And again sadly does not deal with the/my zero measure issue.

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 21, 2014 by xi'an

As I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

Thanks to them for taking the time to answer my musings…

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , on April 15, 2014 by xi'an

“At equilibrium, we thus should not expect gains of several orders of magnitude.”

As was signaled to me several times during the MCqMC conference in Leuven, Rémi Bardenet, Arnaud Doucet and Chris Holmes (all from Oxford) just wrote a short paper for the proceedings of ICML on a way to speed up Metropolis-Hastings by reducing the number of terms one computes in the likelihood ratio involved in the acceptance probability, i.e.

$\prod_{i=1}^n\frac{L(\theta^\prime|x_i)}{L(\theta|x_i)}.$

The observations appearing in this likelihood ratio are a random subsample from the original sample. Even though this leads to an unbiased estimator of the true log-likelihood sum, this approach is not justified on a pseudo-marginal basis à la Andrieu-Roberts (2009). (Writing this in the train back to Paris, I am not convinced this approach is in fact applicable to this proposal as the likelihood itself is not estimated in an unbiased manner…)

In the paper, the quality of the approximation is evaluated by Hoeffding’s like inequalities, which serves as the basis for a stopping rule on the number of terms eventually evaluated in the random subsample. In fine, the method uses a sequential procedure to determine if enough terms are used to take the decision and the probability to take the same decision as with the whole sample is bounded from below. The sequential nature of the algorithm requires to either recompute the vector of likelihood terms for the previous value of the parameter or to store all of them for deriving the partial ratios. While the authors adress the issue of self-evaluating whether or not this complication is worth the effort, I wonder (from my train seat) why they focus so much on recovering the same decision as with the complete likelihood ratio and the same uniform. It would suffice to get the same distribution for the decision (an alternative that is easier to propose than to create of course). I also (idly) wonder if a Gibbs version would be manageable, i.e. by changing only some terms in the likelihood ratio at each iteration, in which case the method could be exact… (I found the above quote quite relevant as, in an alternative technique we are constructing with Marco Banterle, the speedup is particularly visible in the warmup stage.) Hence another direction in this recent flow of papers attempting to speed up MCMC methods against the incoming tsunami of “Big Data” problems.

## Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

## particle efficient importance sampling

Posted in Statistics with tags , , , , , , on October 15, 2013 by xi'an

Marcel Scharth and Robert Kohn just arXived a new article entitled “particle efficient importance sampling“. What is—the efficiency—about?! The spectacular diminution in variance—(the authors mention a factor of 6,000 when compared with regular particle filters!—in a stochastic volatility simulation study.

If I got the details right, the improvement stems from a paper by Richard and Zhang (Journal of  Econometrics, 2007). In a state-space/hidden Markov model setting, (non-sequential) importance sampling tries to approximate the smoothing distribution one term at a time, ie p(xt|xt-1,y1:n), but Richard and Zhang (2007) modify the target by looking at

p(yt|xt)p(xt|xt-1(xt-1,y1:n),

where the last term χ(xt-1,y1:n) is the normalising constant of the proposal kernel for the previous (in t-1) target, k(xt-1|xt-2,y1:n). This kernel is actually parameterised as k(xt-1|xt-2,at(y1:n)) and the EIS algorithm optimises those parameters, one term at a time. The current paper expands Richard and Zhang (2007) by using particles to approximate the likelihood contribution and reduce the variance once the “optimal” EIS solution is obtained. (They also reproduce Richard’s and Zhang’s tricks of relying on the same common random numbers.

This approach sounds like a “miracle” to me, in the sense(s) that (a) the “normalising constant” is far from being uniquely defined (and just as far from being constant in the parameter at) and (b) it is unrelated with the target distribution (except for the optimisation step). In the extreme case when the normalising constant is also constant… in at, this step clearly is useless. (This also opens the potential for an optimisation in the choice of χ(xt-1,y1:n)…)

The simulation study starts from a univariate stochastic volatility model relying on two hidden correlated AR(1) models. (There may be a typo in the definition in Section 4.1, i.e. a Φi missing.) In those simulations, EIS brings a significant variance reduction when compared with standard particle filters and particle EIS further improves upon EIS by a factor of 2 to 20 (in the variance). I could not spot in the paper which choice had been made for χ()… which is annoying as I gathered from my reading that it must have a strong impact on the efficiency attached to the name of the method!

## sampling from time-varying log-concave distributions

Posted in Statistics, University life with tags , , , , , on October 2, 2013 by xi'an

Sasha Rakhlin from Wharton sent me this paper he wrote (and arXived) with Hariharan Narayanan on a specific Markov chain algorithm that handles sequential Monte Carlo problems for log-concave targets. By relying on novel (by my standards) mathematical techniques, they manage to obtain geometric ergodicity results for random-walk based algorithms and log-concave targets. One of the new tools is the notion of self-concordant barrier, a sort of convex potential function F associated with a reference convex set and with Lipschitz properties. The second tool is a Gaussian distribution based on the metric induced by F. The third is the Dikin walk Markov chain, which uses this Gaussian as proposal and moves almost like the Metropolis-Hastings algorithm, except that it rejects with at least a probability of ½. The scale (or step size) of the Gaussian proposal is determined by the regularity of the log-concave target. In that setting, the total variation distance between the target at the t-th level and the distribution of the Markov chain can be fairly precisely approximated. Which leads in turn to a scaling of the number of random walk steps that are necessary to ensure convergence. Depending on the pace of the moving target, a single step of the random walk may be sufficient, which is quite an interesting feature.