## Nonlinear Time Series just appeared

Posted in Statistics, University life, Books, R with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

## evaluating stochastic algorithms

Posted in Books, R, Statistics, University life with tags , , , , , , , , on February 20, 2014 by xi'an

Reinaldo sent me this email a long while ago

Could you recommend me a nice reference about
measures to evaluate stochastic algorithms (in
particular focus in approximating posterior
distributions).


and I hope he is still reading the ‘Og, despite my lack of prompt reply! I procrastinated and procrastinated in answering this question as I did not have a ready reply… We have indeed seen (almost suffered from!) a flow of MCMC convergence diagnostics in the 90′s.  And then it dried out. Maybe because of the impossibility to be “really” sure, unless running one’s MCMC much longer than “necessary to reach” stationarity and convergence. The heat of the dispute between the “single chain school” of Geyer (1992, Statistical Science) and the “multiple chain school” of Gelman and Rubin (1992, Statistical Science) has since long evaporated. My feeling is that people (still) run their MCMC samplers several times and check for coherence between the outcomes. Possibly using different kernels on parallel threads. At best, but rarely, they run (one or another form of) tempering to identify the modal zones of the target. And instances where non-trivial control variates are available are fairly rare. Hence, a non-sequitur reply at the MCMC level. As there is no automated tool available, in my opinion. (Even though I did not check the latest versions of BUGS.)

As it happened, Didier Chauveau from Orléans gave today a talk at Big’MC on convergence assessment based on entropy estimation, a joint work with Pierre Vandekerkhove. He mentioned SamplerCompare which is an R package that appeared in 2010. Soon to come is their own EntropyMCMC package, using parallel simulation. And k-nearest neighbour estimation.

If I re-interpret the question as focussed on ABC algorithms, it gets both more delicate and easier. Easy because each ABC distribution is different. So there is no reason to look at the unreachable original target. Delicate because there are several parameters to calibrate (tolerance, choice of summary, …) on top of the number of MCMC simulations. In DIYABC, the outcome is always made of the superposition of several runs to check for stability (or lack thereof). But this tells us nothing about the distance to the true original target. The obvious but impractical answer is to use some basic bootstrapping, as it is generally much too costly.

## finite mixture models [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on February 17, 2014 by xi'an

Here is a review of Finite Mixture Models (2000) by Geoff McLachlan & David Peel that I wrote aeons ago (circa 1999), supposedly for JASA, which lost first the files and second the will to publish it. As I was working with my student today, I mentioned the book to her and decided to publish it here, if only because I think the book deserved a positive review, even after all those years! (Since then, Sylvia Frühwirth-Schnatter published Finite Mixture and Markov Switching Models (2004), which is closer to my perspective on the topic and that I would more naturally recommend.)

Mixture modeling, that is, the use of weighted sums of standard distributions as in

$\sum_{i=1}^k p_i f({\mathbf y};{\mathbf \theta}_i)\,,$

is a widespread and increasingly used technique to overcome the rigidity of standard parametric distributions such as f(y;θ), while retaining a parametric nature, as exposed in the introduction of my JASA review to Böhning’s (1998) book on non-parametric mixture estimation (Robert, 2000). This review pointed out that, while there are many books available on the topic of mixture estimation, the unsurpassed reference remained the book by Titterington, Smith and Makov (1985)  [hereafter TSM]. I also suggested that a new edition of TSM would be quite timely, given the methodological and computational advances that took place in the past 15 years: while it remains unclear whether or not this new edition will ever take place, the book by McLachlan and Peel gives an enjoyable and fairly exhaustive update on the topic, incorporating the most recent advances on mixtures and some related models.

Geoff McLachlan has been a major actor in the field for at least 25 years, through papers, software—the book concludes with a review of existing software—and books: McLachlan (1992), McLachlan and Basford (1988), and McLachlan and Krishnan (1997). I refer the reader to Lindsay (1989) for a review of the second book, which is a forerunner of, and has much in common with, the present book. Continue reading

## the alive particle filter

Posted in Books, Statistics, University life with tags , , , , , , , , on February 14, 2014 by xi'an

As mentioned earlier on the ‘Og, this is a paper written by Ajay Jasra, Anthony Lee, Christopher Yau, and Xiaole Zhang that I missed when it got arXived (as I was also missing my thumb at the time…) The setting is a particle filtering one with a growing product of spaces and constraints on the moves between spaces. The motivating example is one of an ABC algorithm for an HMM where at each time, the simulated (pseudo-)observation is forced to remain within a given distance of the true observation. The (one?) problem with this implementation is that the particle filter may well die out by making only proposals that stand out of the above ball. Based on an idea of François Le Gland and Nadia Oudjane, the authors define the alive filter by imposing a fixed number of moves onto the subset, running a negative binomial number of proposals. By removing the very last accepted particle, they show that the negative binomial experiment allows for an unbiased estimator of the normalising constant(s). Most of the paper is dedicated to the theoretical study of this scheme, with results summarised as (p.2)

1. Time uniform Lp bounds for the particle filter estimates
2. A central limit theorem (CLT) for suitably normalized and centered particle filter estimates
3. An unbiased property of the particle filter estimates
4. The relative variance of the particle filter estimates, assuming N = O(n), is shown to grow linearly in n.

The assumptions necessary to reach those impressive properties are fairly heavy (or “exceptionally strong” in the words of the authors, p.5): the original and constrained transition kernels are both uniformly ergodic, with equivalent coverage of the constrained subsets for all possible values of the particle at the previous step. I also find the proposed implementation of the ABC filtering inadequate for approximating the posterior on the parameters of the (HMM) model. Expecting every realisation of the simulated times series to be close to the corresponding observed value is too hard a constraint. The above results are scaled in terms of the number N of accepted particles but it may well be that the number of generated particles and hence the overall computing time is much larger. In the examples completing the paper, the comparison is run with an earlier ABC sampler based on the very same stepwise proximity to the observed series so the impact of this choice is difficult to assess and, furthermore, the choice of the tolerances ε is difficult to calibrate: is 3, 6, or 12 a small or large value for ε? A last question that I heard from several sources during talks on that topic is why an ABC approach would be required in HMM settings where SMC applies. Given that the ABC reproduces a simulation on the pair latent variables x parameters, there does not seem to be a gain there…

## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$