Archive for MCMC
My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!
Reinaldo sent me this email a long while ago
Could you recommend me a nice reference about measures to evaluate stochastic algorithms (in particular focus in approximating posterior distributions).
and I hope he is still reading the ‘Og, despite my lack of prompt reply! I procrastinated and procrastinated in answering this question as I did not have a ready reply… We have indeed seen (almost suffered from!) a flow of MCMC convergence diagnostics in the 90′s. And then it dried out. Maybe because of the impossibility to be “really” sure, unless running one’s MCMC much longer than “necessary to reach” stationarity and convergence. The heat of the dispute between the “single chain school” of Geyer (1992, Statistical Science) and the “multiple chain school” of Gelman and Rubin (1992, Statistical Science) has since long evaporated. My feeling is that people (still) run their MCMC samplers several times and check for coherence between the outcomes. Possibly using different kernels on parallel threads. At best, but rarely, they run (one or another form of) tempering to identify the modal zones of the target. And instances where non-trivial control variates are available are fairly rare. Hence, a non-sequitur reply at the MCMC level. As there is no automated tool available, in my opinion. (Even though I did not check the latest versions of BUGS.)
As it happened, Didier Chauveau from Orléans gave today a talk at Big’MC on convergence assessment based on entropy estimation, a joint work with Pierre Vandekerkhove. He mentioned SamplerCompare which is an R package that appeared in 2010. Soon to come is their own EntropyMCMC package, using parallel simulation. And k-nearest neighbour estimation.
If I re-interpret the question as focussed on ABC algorithms, it gets both more delicate and easier. Easy because each ABC distribution is different. So there is no reason to look at the unreachable original target. Delicate because there are several parameters to calibrate (tolerance, choice of summary, …) on top of the number of MCMC simulations. In DIYABC, the outcome is always made of the superposition of several runs to check for stability (or lack thereof). But this tells us nothing about the distance to the true original target. The obvious but impractical answer is to use some basic bootstrapping, as it is generally much too costly.
Here is a review of Finite Mixture Models (2000) by Geoff McLachlan & David Peel that I wrote aeons ago (circa 1999), supposedly for JASA, which lost first the files and second the will to publish it. As I was working with my student today, I mentioned the book to her and decided to publish it here, if only because I think the book deserved a positive review, even after all those years! (Since then, Sylvia Frühwirth-Schnatter published Finite Mixture and Markov Switching Models (2004), which is closer to my perspective on the topic and that I would more naturally recommend.)
Mixture modeling, that is, the use of weighted sums of standard distributions as in
is a widespread and increasingly used technique to overcome the rigidity of standard parametric distributions such as f(y;θ), while retaining a parametric nature, as exposed in the introduction of my JASA review to Böhning’s (1998) book on non-parametric mixture estimation (Robert, 2000). This review pointed out that, while there are many books available on the topic of mixture estimation, the unsurpassed reference remained the book by Titterington, Smith and Makov (1985) [hereafter TSM]. I also suggested that a new edition of TSM would be quite timely, given the methodological and computational advances that took place in the past 15 years: while it remains unclear whether or not this new edition will ever take place, the book by McLachlan and Peel gives an enjoyable and fairly exhaustive update on the topic, incorporating the most recent advances on mixtures and some related models.
Geoff McLachlan has been a major actor in the field for at least 25 years, through papers, software—the book concludes with a review of existing software—and books: McLachlan (1992), McLachlan and Basford (1988), and McLachlan and Krishnan (1997). I refer the reader to Lindsay (1989) for a review of the second book, which is a forerunner of, and has much in common with, the present book. Continue reading
As mentioned earlier on the ‘Og, this is a paper written by Ajay Jasra, Anthony Lee, Christopher Yau, and Xiaole Zhang that I missed when it got arXived (as I was also missing my thumb at the time…) The setting is a particle filtering one with a growing product of spaces and constraints on the moves between spaces. The motivating example is one of an ABC algorithm for an HMM where at each time, the simulated (pseudo-)observation is forced to remain within a given distance of the true observation. The (one?) problem with this implementation is that the particle filter may well die out by making only proposals that stand out of the above ball. Based on an idea of François Le Gland and Nadia Oudjane, the authors define the alive filter by imposing a fixed number of moves onto the subset, running a negative binomial number of proposals. By removing the very last accepted particle, they show that the negative binomial experiment allows for an unbiased estimator of the normalising constant(s). Most of the paper is dedicated to the theoretical study of this scheme, with results summarised as (p.2)
1. Time uniform Lp bounds for the particle filter estimates
2. A central limit theorem (CLT) for suitably normalized and centered particle filter estimates
3. An unbiased property of the particle filter estimates
4. The relative variance of the particle filter estimates, assuming N = O(n), is shown to grow linearly in n.
The assumptions necessary to reach those impressive properties are fairly heavy (or “exceptionally strong” in the words of the authors, p.5): the original and constrained transition kernels are both uniformly ergodic, with equivalent coverage of the constrained subsets for all possible values of the particle at the previous step. I also find the proposed implementation of the ABC filtering inadequate for approximating the posterior on the parameters of the (HMM) model. Expecting every realisation of the simulated times series to be close to the corresponding observed value is too hard a constraint. The above results are scaled in terms of the number N of accepted particles but it may well be that the number of generated particles and hence the overall computing time is much larger. In the examples completing the paper, the comparison is run with an earlier ABC sampler based on the very same stepwise proximity to the observed series so the impact of this choice is difficult to assess and, furthermore, the choice of the tolerances ε is difficult to calibrate: is 3, 6, or 12 a small or large value for ε? A last question that I heard from several sources during talks on that topic is why an ABC approach would be required in HMM settings where SMC applies. Given that the ABC reproduces a simulation on the pair latent variables x parameters, there does not seem to be a gain there…