I was attending a lecture this morning at CREST by Patrice Bertail where he was using estimated renewal parameters on a Markov chain to build (asymptotically) convergent bootstrap procedures. Estimating renewal parameters is obviously of interest in MCMC algorithms as they can be used to assess the convergence of the associated Markov chain: That is, if the estimation does not induce a significant bias. Another question that came to me during the talk is that; since those convergence assessments techniques are formally holding for any small set, choosing the small set in order to maximise the renewal rate also maximises the number of renewal events and hence the number of terms in the control sequence: Thus, the maximal renewal rate þ is definitely a quantity of interest: Now, is this quantity þ an intrinsic parameter of the chain, i.e. a quantity that drives its mixing and/or converging behaviour(s)? For instance; an iid sequence has a renewal rate of 1; because the whole set is a “small” set. Informally, the time between two consecutive renewal events is akin to the time between two simulations from the target and stationary distribution, according to the Kac’s representation we used in our AAP paper with Jim Hobert. So it could be that þ is directly related with the effective sample size of the chain, hence the autocorrelation. (A quick web search did not produce anything relevant:) Too bad this question did not pop up last week when I had the opportunity to discuss it with Sean Meyn in Gainesville!
Archive for effective sample size
Jean-Michel Marin just posted on arXiv a joint paper of ours, Efficient learning in ABC algorithms. This paper, to which elaboration [if not redaction] I contributed to, is one of the chapters of Mohammed Sedki’s thesis. (Mohammed is about to defend this thesis, currently with reviewers. A preliminary version of this paper was presented at ABC in London and it is in revision with Statistics and Computing. Hence missing the special issue!)
The paper builds on the sequential ABC scheme of Del Moral et al. (2012), already discussed in this post, where the tolerance level at each step is adapted from the previous iterations as a quantile of the distances. (The quantile level is based on a current effective sample size.) In a “systematic” step, the particles that are closest to the observations are preserved and duplicated, while those farther away are sampled randomly. The resulting population of particles is then perturbed by an adaptive (random walk) kernel and the algorithm stops when the tolerance level does not decrease any longer or when the acceptance rate of the Metropolis step is too low. Pierre Pudlo and Mohammed Sedki experimented a parallel implementation of the algorithm, with an almost linear improvement in the number of cores. It is less clear the same would work on a GPU because of the communication requirements. Overall, the new algorithm brings a significant improvement in computing time when compared with earlier versions, mainly because the number of simulations from the model is minimised. (It was tested on a rather complex population scenario retracing the invasion of honeybees in Europe (in connection with the previous post!)
In the previous days I have received several emails asking for clarification of the effective sample size derivation in “Introducing Monte Carlo Methods with R” (Section 4.4, pp. 98-100). Formula (4.3) gives the Monte Carlo estimate of the variance of a self-normalised importance sampling estimator (note the change from the original version in Introducing Monte Carlo Methods with R ! The weight W is unnormalised and hence the normalising constant appears in the denominator.)
Now, the front term is somehow obvious so let us concentrate on the bracketed part. The empirical variance of the ‘s is
the coefficient is thus estimated by
which leads to the definition of the effective sample size
The confusing part in the current version is whether or not we use normalised W’s and ‘s. I hope this clarifies the issue!