Archive for Monte Carlo Statistical Methods
My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!
Unfortunately, I will miss the incoming Bayes in Paris seminar next Thursday (27th February), as I will be flying to Montréal and then Québec at the time (despite having omitted to book a flight till now!). Indeed Amandine Shreck will give a talk at 2pm in room 18 of ENSAE, Malakoff, on A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection, a work written jointly with Gersende Fort, Sylvain Le Corff, and Eric Moulines, and arXived at the end of 2013 (which may explain why I missed it!). Here is the abstract:
This paper introduces a new Markov Chain Monte Carlo method to perform Bayesian variable selection in high dimensional settings. The algorithm is a Hastings-Metropolis sampler with a proposal mechanism which combines (i) a Metropolis adjusted Langevin step to propose local moves associated with the differentiable part of the target density with (ii) a shrinkage-thresholding step based on the non-differentiable part of the target density which provides sparse solutions such that small components are shrunk toward zero. This allows to sample from distributions on spaces with different dimensions by actually setting some components to zero. The performances of this new procedure are illustrated with both simulated and real data sets. The geometric ergodicity of this new transdimensional Markov Chain Monte Carlo sampler is also established.
(I will definitely get a look at the paper over the coming days!)
Reinaldo sent me this email a long while ago
Could you recommend me a nice reference about measures to evaluate stochastic algorithms (in particular focus in approximating posterior distributions).
and I hope he is still reading the ‘Og, despite my lack of prompt reply! I procrastinated and procrastinated in answering this question as I did not have a ready reply… We have indeed seen (almost suffered from!) a flow of MCMC convergence diagnostics in the 90′s. And then it dried out. Maybe because of the impossibility to be “really” sure, unless running one’s MCMC much longer than “necessary to reach” stationarity and convergence. The heat of the dispute between the “single chain school” of Geyer (1992, Statistical Science) and the “multiple chain school” of Gelman and Rubin (1992, Statistical Science) has since long evaporated. My feeling is that people (still) run their MCMC samplers several times and check for coherence between the outcomes. Possibly using different kernels on parallel threads. At best, but rarely, they run (one or another form of) tempering to identify the modal zones of the target. And instances where non-trivial control variates are available are fairly rare. Hence, a non-sequitur reply at the MCMC level. As there is no automated tool available, in my opinion. (Even though I did not check the latest versions of BUGS.)
As it happened, Didier Chauveau from Orléans gave today a talk at Big’MC on convergence assessment based on entropy estimation, a joint work with Pierre Vandekerkhove. He mentioned SamplerCompare which is an R package that appeared in 2010. Soon to come is their own EntropyMCMC package, using parallel simulation. And k-nearest neighbour estimation.
If I re-interpret the question as focussed on ABC algorithms, it gets both more delicate and easier. Easy because each ABC distribution is different. So there is no reason to look at the unreachable original target. Delicate because there are several parameters to calibrate (tolerance, choice of summary, …) on top of the number of MCMC simulations. In DIYABC, the outcome is always made of the superposition of several runs to check for stability (or lack thereof). But this tells us nothing about the distance to the true original target. The obvious but impractical answer is to use some basic bootstrapping, as it is generally much too costly.
Crakel and Flegal just arXived a short paper running ABC for doing inference on the parameters of two families of bivariate betas. And I could not but read it thru. And wonder why ABC was that necessary to handle the model. The said bivariate betas are defined from
This makes each term in the pair Beta and the two components dependent. This construct was proposed by Arnold and Ng (2011). (The five-parameter version cancels the gammas for i=3,4,5.)
Since the pdf of the joint distribution is not available in closed form, Crakel and Flegal zoom on ABC-MCMC as the method of choice and discuss simulation experiments. (The choice of the tolerance ε as an absolute rather than relative value, ε=0.2,0.0.6,0.8, puzzles me, esp. since the distance between the summary statistics is not scaled.) I however wonder why other approaches are impossible. (Or why it is necessary to use this distribution to model correlated betas. Unless I am confused copulas were invented to this effect.) First, this is a latent variable model, so latent variables could be introduced inside an MCMC scheme. A wee bit costly but feasible. Second, several moments of those distributions are known so a empirical likelihood approach could be considered.