My friends and coauthors Matthieu Gerber and Randal Douc have just arXived a massive paper on online approximate Bayesian learning, namely the handling of the posterior distribution on the parameters of a state-space model, which remains a challenge to this day… Starting from the iterated batch importance sampling (IBIS) algorithm of Nicolas (Chopin, 2002) which he introduced in his PhD thesis. The online (“by online we mean that the memory and computational requirement to process each observation is finite and bounded uniformly in t”) method they construct is guaranteed for the approximate posterior to converge to the (pseudo-)true value of the parameter as the sample size grows to infinity, where the sequence of approximations is a Cesaro mixture of initial approximations with Gaussian or t priors, AMIS like. (I am somewhat uncertain about the notion of a sequence of priors used in this setup. Another funny feature is the necessity to consider a fat tail t prior from time to time in this sequence!) The sequence is in turn approximated by a particle filter. The computational cost of this IBIS is roughly in O(NT), depending on the regeneration rate.
Archive for convergence
online approximate Bayesian learning
Posted in Statistics with tags AMIS, approximate MCMC, Bayesian learning, consistency, convergence, ibis, sequential Monte Carlo, state space model on September 25, 2020 by xi'anABC with Gibbs steps
Posted in Statistics with tags ABC, ABC-Gibbs, Approximate Bayesian computation, Bayesian inference, bois de Boulogne, compatible conditional distributions, contraction, convergence, ergodicity, France, Gibbs sampler, hierarchical Bayesian modelling, incompatible conditionals, La Défense, Paris, stationarity, tolerance, Université Paris Dauphine on June 3, 2019 by xi'anWith Grégoire Clarté, Robin Ryder and Julien Stoehr, all from Paris-Dauphine, we have just arXived a paper on the specifics of ABC-Gibbs, which is a version of ABC where the generic ABC accept-reject step is replaced by a sequence of n conditional ABC accept-reject steps, each aiming at an ABC version of a conditional distribution extracted from the joint and intractable target. Hence an ABC version of the standard Gibbs sampler. What makes it so special is that each conditional can (and should) be conditioning on a different statistic in order to decrease the dimension of this statistic, ideally down to the dimension of the corresponding component of the parameter. This successfully bypasses the curse of dimensionality but immediately meets with two difficulties. The first one is that the resulting sequence of conditionals is not coherent, since it is not a Gibbs sampler on the ABC target. The conditionals are thus incompatible and therefore convergence of the associated Markov chain becomes an issue. We produce sufficient conditions for the Gibbs sampler to converge to a stationary distribution using incompatible conditionals. The second problem is then that, provided it exists, the limiting and also intractable distribution does not enjoy a Bayesian interpretation, hence may fail to be justified from an inferential viewpoint. We however succeed in producing a version of ABC-Gibbs in a hierarchical model where the limiting distribution can be explicited and even better can be weighted towards recovering the original target. (At least with limiting zero tolerance.)
efficient adaptive importance sampling
Posted in Books, Statistics with tags AMIS, convergence, generalised moments, importance sampling, Kullback-Leibler divergence, martingales, oracle inequalities, population Monte Carlo on June 22, 2018 by xi'anBernard Delyon and François Portier just recently arXived a paper on population or evolutionary importance sampling, pointed out to me by Víctor Elvira. Changing the proposal or importance sampler at each iteration. And averaging the estimates across iterations, but also mentioning AMIS. While drawing a distinction that I do not understand, since the simulation cost remains the same, while improving the variance of the resulting estimator. (But the paper points out later that their martingale technique of proof does not apply in this AMIS case.) Some interesting features of the paper are that
- convergence occurs when the total number of simulations grows to infinity, which is the most reasonable scale for assessing the worth of the method;
- some optimality in the oracle sense is established for the method;
- an improvement is found by eliminating outliers and favouring update rate over simulation rate (at a constant cost). Unsurprisingly, the optimal weight of the t-th estimator is given by its inverse variance (with eqn (13) missing an inversion step). Although it relies on the normalised versions of the target and proposal densities, since it assumes the expectation of the ratio is equal to one.
When updating the proposal or importance distribution, the authors consider a parametric family with the update in the parameter being driven by moment or generalised moment matching, or Kullback reduction as in our population Monte Carlo paper. The interesting technical aspects of the paper include the use of martingale and empirical risk arguments. All in all, quite a pleasant surprise to see some follow-up to our work on that topic, more than 10 years later.
adaptive independent Metropolis-Hastings
Posted in Statistics with tags adaptive MCMC, convergence, Doeblin's condition, independent Metropolis-Hastings algorithm, Markov chains, MCMC algorithms, Wolfgang Doeblin on May 8, 2018 by xi'anWhen rereading this paper by Halden et al. (2009), I was reminded of the earlier and somewhat under-appreciated Gåsemyr (2003). But I find the convergence results therein rather counter-intuitive in that they seem to justify adaptive independent proposals with no strong requirement. Besides the massive Doeblin condition:
“The Doeblin condition essentially requires that all the proposal distribution [sic] has uniformly heavier tails than the target distribution.”
Even when the adaptation is based on an history vector made of rejected values and non-replicated accepted values. Actually convergence of this sequence of adaptive proposals kernels is established under a concentration of the Doeblin constants a¹,a²,… towards one, in the sense that
E[(1-a¹)(1-a²)…]=0.
The reason may be that, with chains satisfying a Doeblin condition, there is a probability to reach stationarity at each step. Equal to a¹, a², … And hence to ignore adaptivity since each kernel keep the target π invariant. So in the end this is not so astounding. (The paper also reminded me of Wolfgang [or Vincent] Doeblin‘s short and tragic life.)
exams
Posted in Kids, Statistics, University life with tags Basu's theorem, bootstrap, convergence, copies, correction, exam, mathematical statistics, Université Paris Dauphine on February 7, 2018 by xi'an