## online approximate Bayesian learning

Posted in Statistics with tags , , , , , , , on September 25, 2020 by xi'an

My friends and coauthors Matthieu Gerber and Randal Douc have just arXived a massive paper on online approximate Bayesian learning, namely the handling of the posterior distribution on the parameters of a state-space model, which remains a challenge to this day… Starting from the iterated batch importance sampling (IBIS) algorithm of Nicolas (Chopin, 2002) which he introduced in his PhD thesis. The online (“by online we mean that the memory and computational requirement to process each observation is finite and bounded uniformly in t”) method they construct is guaranteed for the approximate posterior to converge to the (pseudo-)true value of the parameter as the sample size grows to infinity, where the sequence of approximations is a Cesaro mixture of initial approximations with Gaussian or t priors, AMIS like. (I am somewhat uncertain about the notion of a sequence of priors used in this setup. Another funny feature is the necessity to consider a fat tail t prior from time to time in this sequence!) The sequence is in turn approximated by a particle filter. The computational cost of this IBIS is roughly in O(NT), depending on the regeneration rate.

## ABC with Gibbs steps

Posted in Statistics with tags , , , , , , , , , , , , , , , , , on June 3, 2019 by xi'an

With Grégoire Clarté, Robin Ryder and Julien Stoehr, all from Paris-Dauphine, we have just arXived a paper on the specifics of ABC-Gibbs, which is a version of ABC where the generic ABC accept-reject step is replaced by a sequence of n conditional ABC accept-reject steps, each aiming at an ABC version of a conditional distribution extracted from the joint and intractable target. Hence an ABC version of the standard Gibbs sampler. What makes it so special is that each conditional can (and should) be conditioning on a different statistic in order to decrease the dimension of this statistic, ideally down to the dimension of the corresponding component of the parameter. This successfully bypasses the curse of dimensionality but immediately meets with two difficulties. The first one is that the resulting sequence of conditionals is not coherent, since it is not a Gibbs sampler on the ABC target. The conditionals are thus incompatible and therefore convergence of the associated Markov chain becomes an issue. We produce sufficient conditions for the Gibbs sampler to converge to a stationary distribution using incompatible conditionals. The second problem is then that, provided it exists, the limiting and also intractable distribution does not enjoy a Bayesian interpretation, hence may fail to be justified from an inferential viewpoint. We however succeed in producing a version of ABC-Gibbs in a hierarchical model where the limiting distribution can be explicited and even better can be weighted towards recovering the original target. (At least with limiting zero tolerance.)

## efficient adaptive importance sampling

Posted in Books, Statistics with tags , , , , , , , on June 22, 2018 by xi'an

Bernard Delyon and François Portier just recently arXived a paper on population or evolutionary importance sampling, pointed out to me by Víctor Elvira. Changing the proposal or importance sampler at each iteration. And averaging the estimates across iterations, but also mentioning AMIS. While drawing a distinction that I do not understand, since the simulation cost remains the same, while improving the variance of the resulting estimator. (But the paper points out later that their martingale technique of proof does not apply in this AMIS case.) Some interesting features of the paper are that

• convergence occurs when the total number of simulations grows to infinity, which is the most reasonable scale for assessing the worth of the method;
• some optimality in the oracle sense is established for the method;
• an improvement is found by eliminating outliers and favouring update rate over simulation rate (at a constant cost). Unsurprisingly, the optimal weight of the t-th estimator is given by its inverse variance (with eqn (13) missing an inversion step). Although it relies on the normalised versions of the target and proposal densities, since it assumes the expectation of the ratio is equal to one.

When updating the proposal or importance distribution, the authors consider a parametric family with the update in the parameter being driven by moment or generalised moment matching, or Kullback reduction as in our population Monte Carlo paper. The interesting technical aspects of the paper include the use of martingale and empirical risk arguments. All in all, quite a pleasant surprise to see some follow-up to our work on that topic, more than 10 years later.

## adaptive independent Metropolis-Hastings

Posted in Statistics with tags , , , , , , on May 8, 2018 by xi'an

When rereading this paper by Halden et al. (2009), I was reminded of the earlier and somewhat under-appreciated Gåsemyr (2003). But I find the convergence results therein rather counter-intuitive in that they seem to justify adaptive independent proposals with no strong requirement. Besides the massive Doeblin condition:

“The Doeblin condition essentially requires that all the proposal distribution [sic] has uniformly heavier tails than the target distribution.”

Even when the adaptation is based on an history vector made of rejected values and non-replicated accepted values. Actually  convergence of this sequence of adaptive proposals kernels is established under a concentration of the Doeblin constants a¹,a²,… towards one, in the sense that

E[(1-a¹)(1-a²)…]=0.

The reason may be that, with chains satisfying a Doeblin condition, there is a probability to reach stationarity at each step. Equal to a¹, a², … And hence to ignore adaptivity since each kernel keep the target π invariant. So in the end this is not so astounding. (The paper also reminded me of Wolfgang [or Vincent] Doeblin‘s short and tragic life.)

## exams

Posted in Kids, Statistics, University life with tags , , , , , , , on February 7, 2018 by xi'an
As in every term, here comes the painful week of grading hundreds of exams! My mathematical statistics exam was highly traditional and did not even involve Bayesian material, as the few students who attended the lectures were so eager to discuss sufficiency and ancilarity, that I decided to spend an extra lecture on these notions rather than rushing though conjugate priors. Highly traditional indeed with an inverse Gaussian model and a few basic consequences of Basu’s theorem. actually exposed during this lecture. Plus mostly standard multiple choices about maximum likelihood estimation and R programming… Among the major trends this year, I spotted out the widespread use of strange derivatives of negative powers, the simultaneous derivation of two incompatible convergent estimates, the common mixup between the inverse of a sum and the sum of the inverses, the inability to produce the MLE of a constant transform of the parameter, the choice of estimators depending on the parameter, and a lack of concern for Fisher informations equal to zero.