With Charly Andral (PhD, Paris Dauphine), Randal Douc, and Hugo Marival (PhD, Telecom SudParis), we just arXived a paper on importance Markov chains that merges importance sampling and MCMC. An idea already mentioned in Hastings (1970) and even earlier in Fodsick (1963), and later exploited in Liu et al. (2003) for instance. And somewhat dual of the vanilla Rao-Backwellisation paper Randal and I wrote a (long!) while ago. Given a target π with a dominating measure π⁰≥Mπ, using a Markov kernel to simulate from this dominating measure and subsampling by the importance weight ρ does produce a new Markov chain with the desired target measure as invariant distribution. However, the domination assumption is rather unrealistic and a generic approach can be implemented without it, by defining an extended Markov chain, with the addition of the number N of replicas as the supplementary term… And a transition kernel R(n|x) on N with expectation ρ, which is a minimal(ist) assumption for the validation of the algorithm.. While this initially defines a semi-Markov chain, an extended Markov representation is also feasible, by decreasing N one by one until reaching zero, and this is most helpful in deriving convergence properties for the resulting chain, including a CLT. While the choice of the kernel R is free, the optimal choice is associated with residual sampling, where only the fractional part of ρ is estimated by a Bernoulli simulation.
Archive for Markov kernel
important Markov chains
Posted in Books, Statistics, University life with tags arXiv, CLT, geometric ergodicity, importance sampling, Law of Large Numbers, Markov kernel, MCMC, PhD students, residual sampling, semi-Markov chain, Université Paris Dauphine, vanilla Rao-Blackwellisation on July 21, 2022 by xi'annull recurrent = zero utility?
Posted in Books, R, Statistics with tags ergodicity, integral priors, Markov chain, Markov kernel, MCMC, null recurrence, R, simulation on April 28, 2022 by xi'anThe stability result that the ratio
converges holds for a Harris π-null-recurrent Markov chain for all functions f,g in L¹(π) [Meyn & Tweedie, 1993, Theorem 17.3.2] is rather fascinating. However, it is unclear it can be useful in simulation environments, as for the integral priors we have been studying over the years with Juan Antonio Cano and Diego Salmeron Martinez. Above, the result of an experiment where I simulated a Markov chain as a Normal random walk in dimension one, hence a Harris π-null-recurrent Markov chain for the Lebesgue measure λ, and monitored the stabilisation of the ratio (1) when using two densities for f and g, to its expected value (1, shown by a red horizontal line). There is quite a variability in the outcome (repeated 100 times), but the most intriguing is the quick stabilisation of most cumulated averages to values different from 1. Even longer runs display this feature
which I would blame on the excursions of the random walk far away from the central regions for both f and g, that is on long sequences where zeroes keep being added to numerator and denominators in (1). As far as integral approximation is concerned, this is not very helpful!
optimal choice among MCMC kernels
Posted in Statistics with tags Angkor Wat, Cambodia, delayed acceptance, filamentary distribution, invariance, invariant measure, Markov kernel, Normandie, population Monte Carlo, Siem Reap, sparsity on March 14, 2019 by xi'anLast week in Siem Reap, Florian Maire [who I discovered originates from a Norman town less than 10km from my hometown!] presented an arXived joint work with Pierre Vandekerkhove at the Data Science & Finance conference in Cambodia that considers the following problem: Given a large collection of MCMC kernels, how to pick the best one and how to define what best means. Going by mixtures is a default exploration of the collection, as shown in (Tierney) 1994 for instance since this improves on both kernels (esp. when each kernel is not irreducible on its own!). This paper considers a move to local weights in the mixture, weights that are not estimated from earlier simulations, contrary to what I first understood.
As made clearer in the paper the focus is on filamentary distributions that are concentrated nearby lower-dimension sets or manifolds Since then the components of the kernel collections can be restricted to directions of these manifolds… Including an interesting case of a 2-D highly peaked target where converging means mostly simulating in x¹ and covering the target means mostly simulating in x². Exhibiting a schizophrenic tension between the two goals. Weight locally dependent means correction by Metropolis step, with cost O(n). What of Rao-Blackwellisation of these mixture weights, from weight x transition to full mixture, as in our PMC paper? Unclear to me as well [during the talk] is the use in the mixture of basic Metropolis kernels, which are not absolutely continuous, because of the Dirac mass component. But this is clarified by Section 5 in the paper. A surprising result from the paper (Corollary 1) is that the use of local weights ω(i,x) that depend on the current value of the chain does jeopardize the stationary measure π(.) of the mixture chain. Which may be due to the fact that all components of the mixture are already π-invariant. Or that the index of the kernel constitutes an auxiliary (if ancillary) variate. (Algorithm 1 in the paper reminds me of delayed acceptance. Making me wonder if computing time should be accounted for.) A final question I briefly discussed with Florian is the extension to weights that are automatically constructed from the simulations and the target.
A chance non-typo
Posted in Books, Statistics, University life with tags Markov kernel, Monte Carlo Statistical Methods, typos on March 18, 2011 by xi'anA few days ago, my colleague Medhi Dafal in the admin branch of the university came across Monte Carlo Statistical Methods on another colleague’s desk and took it with him. Yesterday morning, when I came to his office to burden him with another admin nightmare, he opened the book [at random] and pointed out a formula pretending he had trouble with it! This was the residual distribution in the Markov chapter
When I first saw it, I though this was indeed a typo because of the indicator in the denominator. Now that I look at it more carefully, I realise this is correct, because C is a small set, hence the residual kernel is the standard kernel outside C. Still, for a few minutes, I had this weird impression he had found a new typo completely at random! Which would have been a significant item of information about the frequency of typos in this book… Phew!