Archive for convergence

running MCMC for too long, and even longer…

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , on October 23, 2013 by xi'an

Clifton observatory, Clifton, Sept. 24, 2012Following my earlier post about the young astronomer who feared he was running his MCMC for too long, here is an update from his visit to my office this morning.  This visit proved quite an instructive visit for both of us. (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is as earlier completely unrelated with the young astronomer!)

First, the reason why he thought MCMC was running too long was that the acceptance rate was plummeting down to zero, whatever the random walk scale. The reason for this behaviour is that he was actually running a standard simulated annealing algorithm, hence observing the stabilisation of the Markov chain in one of the (global) modes of the target function. In that sense, he was right that the MCMC was run for “too long”, as there was nothing to expect once the mode had been reached and the temperature turned down to zero. So the algorithm was working correctly.

Second, the astronomy problem he considers had a rather complex likelihood, for which he substituted a distance between the (discretised) observed data and (discretised) simulated data, simulated conditional on the current parameter value. Now…does this ring a bell? If not, here is a three letter clue: ABC… Indeed, the trick he had found to get around this likelihood calculation issue was to re-invent a version of ABC-MCMC! Except that the distance was re-introduced into a regular MCMC scheme as a substitute to the log-likelihood. And compared with the distance at the previous MCMC iteration. This is quite clever, even though this substitution suffers from a normalisation issue (that I already mentioned in the post about Holmes’ and Walker’s idea to turn loss functions into pseudo likelihoods. Regular ABC does not encounter this difficult, obviously. I am still bemused by this reinvention of ABC from scratch!  

So we are now at a stage where my young friend will experiment with (hopefully) correct ABC steps, trying to derive the tolerance value from warmup simulations and use some of the accelerating tricks suggested by Umberto Picchini and Julie Forman to avoid simulating the characteristics of millions of stars for nothing. And we agreed to meet soon for an update. Indeed, a fairly profitable morning for both of us!

running MCMC for too long…

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , on October 20, 2013 by xi'an

Clifton observatory, Clifton, Sept. 24, 2012Here is an excerpt from an email I just received from a young astronomer with whom I have had many email exchanges about the nature and implementation of MCMC algorithms, not making my point apparently:

The acceptance ratio turn to be good if I used (imposed by me) smaller numbers of iterations. What I think I am doing wrong is the convergence criteria. I am not stopping when I should stop.

To which I replied he should come (or Skype) and talk with me as I cannot get into enough details to point out his analysis is wrong… It may be the case that the MCMC algorithm finds a first mode, explores its neighbourhood (hence a good acceptance rate and green signals for convergence), then wanders away, attracted by other modes. It may also be the case the code has mistakes. Anyway, you cannot stop a (basic) MCMC algorithm too late or let it run for too long! (Disclaimer: the picture of an observatory seen from across Brunel’s suspension bridge in Bristol is unrelated to the young astronomer!)

Bayesian brittleness

Posted in Statistics with tags , , , , , on May 3, 2013 by xi'an

Here is the abstract of a recently arXived paper that attracted my attention:

Although it is known that Bayesian estimators may be inconsistent if the model is misspecified, it is also a popular belief that a “good” or “close” enough model should have good convergence properties. This paper shows that, contrary to popular belief, there is no such thing as a “close enough” model in Bayesian inference in the following sense: we derive optimal lower and upper bounds on posterior values obtained from models that exactly capture an arbitrarily large number of finite-dimensional marginals of the data-generating distribution and/or that are arbitrarily close to the data-generating distribution in the Prokhorov or total variation metrics; these bounds show that such models may still make the largest possible prediction error after conditioning on an arbitrarily large number of sample data. Therefore, under model misspecification, and without stronger assumptions than (arbitrary) closeness in Prokhorov or total variation metrics, Bayesian inference offers no better guarantee of accuracy than arbitrarily picking a value between the essential infimum and supremum of the quantity of interest. In particular, an unscrupulous practitioner could slightly perturb a given prior and model to achieve any desired posterior conclusions.ink

The paper is both too long and too theoretical for me to get into it deep enough. The main point however is that, given the space of all possible measures, the set of (parametric) Bayes inferences constitutes a tiny finite-dimensional that may lie far far away from the true model. I do not find the result unreasonable, far from it!, but the fact that Bayesian (and other) inferences may be inconsistent for most misspecified models is not such a major issue in my opinion. (Witness my post on the Robins-Wasserman paradox.) I am not so much convinced either about this “popular belief that a “good” or “close” enough model should have good convergence properties”, as it is intuitively reasonable that the immensity of the space of all models can induce non-convergent behaviours. The statistical question is rather what can be done about it. Does it matter that the model is misspecified? If it does, is there any meaning in estimating parameters without a model? For a finite sample size, should we at all bother that the model is not “right” or “close enough” if discrepancies cannot be detected at this precision level? I think the answer to all those questions is negative and that we should proceed with our imperfect models and imperfect inference as long as our imperfect simulation tools do not exhibit strong divergences.

AMIS convergence, at last!

Posted in Statistics, University life with tags , , , , , , on November 19, 2012 by xi'an

This afternoon, Jean-Michel Marin gave his talk at the big’MC seminar. As already posted, it was about a convergence proof for AMIS, which gave me the opportunity to simultaneously read the paper and listen to the author. The core idea for adapting AMIS towards a manageable version is to update the proposal parameter based on the current sample rather than on the whole past. This facilitates the task of establishing convergence to the optimal (pseudo-true) value of the parameter, under an assumption that the optimal value is a know moment of the target. From there, convergence of the weighted mean is somehow natural when the number of simulations grows to infinity. (Note the special asymptotics of AMIS, though, which are that the number of steps goes to infinity while the number of simulations per step grows a wee faster than linearly. In this respect, it is the opposite of PMC, where convergence is of a more traditional nature, pushing the number of simulations per step to infinity.) The second part of the convergence proof is more intricate, as it establishes that the multiple mixture estimator based on the “forward-backward” reweighting of all simulations since step zero does converge to the proper posterior moment. This relies on rather complex assumptions, but remains a magnificent tour de force. During the talk, I wondered if, given the Markovian nature of the algorithm (since reweighting only occurs once simulation is over), an alternative estimator based on the optimal value of the simulation parameter would not be better than the original multiple mixture estimator: the proof is based on the equivalence between both versions….

ABC as knn…

Posted in Statistics with tags , , , , on September 13, 2012 by xi'an

Gérard Biau, Frédéric Cérou, and Arnaud Guyader recently posted an arXiv paper on the foundations of ABC, entitled “New insights into Approximate Bayesian Computation“. They also submitted it to several statistics journals, with no success so far, and I find this rather surprising. Indeed, the paper analyses the ABC algorithm the way it is truly implemented (as in DIYABC for instance), i.e. with a tolerance bound ε that is determined as a quantile of the simulated distances, say the 10% or the 1% quantile. This means in particular that the interpretation of ε as a non-parametric bandwidth, while interesting and prevalent in the literature (see, e.g., Fearnhead and Prangle’s discussion paper), is only an approximation of the actual practice.

The authors of this new paper focus on the mathematical foundations of this practice, by (re)analysing ABC as a k-nearest neighbour (knn) method. Using generic knn results, they thus derive a consistency property for the ABC algorithm by imposing some constraints upon the rate of decrease of the quantile as a function of n. (The setting is restricted to the use of sufficient statistics or, equivalently, to a distance over the whole sample. The issue of summary statistics is not addressed by the paper.) The paper also contains a perfectly rigorous proof (the first one?) of the convergence of ABC when the tolerance ε goes to zero. The mean integrated square error consistency of the conditional kernel density estimate is established for a generic kernel (under usual assumptions). Further assumptions (on the target and on the kernel) allow the authors to obtain precise convergence rates (as a power of the sample size), derived from classical k-nearest neighbour regression, like

k_N \approx N^{(p+4)/(m+p+4)}

in dimensions m larger than 4…. The paper is completely theoretical and highly mathematical (with 25 pages of proofs!), which may explain why it did not meet with success with editors and/or referees, however I definitely think (an abridged version of) this work clearly deserves publication in a top statistics journal as a reference for the justification of ABC! The authors also mention future work in that direction: I would strongly suggest they consider the case of the insufficient summary statistics from this knn perspective.

potentially relevant

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on March 14, 2012 by xi'an

This week, freshly back from Roma, I got the reviews on our paper “Relevant statistics for Bayesian model choice” from Series B. The comments are detailed and mostly to the point, expressing concern about the relevance of the paper for statistical methodology as the major issue.  We are thus asked for a revision making a much better connection with ABC methodology.

This is not an unexpected outcome, from my point of view, because the paper is indeed quite theoretical and the mathematical assumptions required to obtain the convergence theorems are rather overwhelming… Meaning that in practical cases they cannot truly be checked. However, I think we can eventually address those concerns for two distinct reasons: first, the paper comes as a third step in a series of papers where we first identified a sufficiency property, then realised that this property was actually quite a rare occurrence, and finally made a theoretical advance as to when is a summary statistic enough (i.e. “sufficient” in the standard sense of the term!)  to conduct model choice, with a clear answer that the mean ranges of the summary statistic under each model could not intersect.  Second, my own personal view is that those assumptions needed for convergence are not of the highest importance for statistical practice (even though they are needed in the paper!) and thus that, from a methodological point of view, only the conclusion should be taken into account. It is then rather straightforward to come up with (quick-and-dirty) simulation devices to check whether a summary statistic behaves differently under both models, taking advantage of the reference table already available (instead of having to run Monte Carlo experiments with ABC basis)…

One of the comments was that maybe Bayes factors were not appropriate for conducting model choice, thus making the whole derivation irrelevant. This is a possible perspective but it can be objected that Bayes factors and posterior probabilities are used in conjunction with ABC in dozens of genetic papers. Further arguments are provided in the various replies to both of Templeton’s radical criticisms. That more empirical and model-based assessments also are available is quite correct, as demonstrated in the multicriterion approach of Olli Ratmann and co-authors. This is simply another approach, not followed by most geneticists so far…

updated slides for ABC PhD course

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , on February 8, 2012 by xi'an

Over the weekend, I have added a few slides referring to recent papers mentioning the convergence of ABC algorithms, in particular the very relevant paper by Dean et al. I had already discussed in an earlier post. (This coursing is taking a much larger chunk of my time than I expected! I am thus relieved I will use the same slides in Roma next month and presumably in Melbourne next summer…)

Follow

Get every new post delivered to your Inbox.

Join 640 other followers