**I**n connection with the recent PhD thesis defence of Juliette Chevallier, in which I took a somewhat virtual part for being physically in Warwick, I read a paper she wrote with Stéphanie Allassonnière on stochastic approximation versions of the EM algorithm. Computing the MAP estimator can be done via some adapted for simulated annealing versions of EM, possibly using MCMC as for instance in the Monolix software and its MCMC-SAEM algorithm. Where SA stands sometimes for stochastic approximation and sometimes for simulated annealing, originally developed by Gilles Celeux and Jean Diebolt, then reframed by Marc Lavielle and Eric Moulines [friends and coauthors]. With an MCMC step because the simulation of the latent variables involves an untractable normalising constant. (Contrary to this paper, Umberto Picchini and Adeline Samson proposed in 2015 a genuine ABC version of this approach, paper that I thought I missed—although I now remember discussing it with Adeline at JSM in Seattle—, ABC is used as a substitute for the conditional distribution of the latent variables given data and parameter. To be used as a substitute for the Q step of the (SA)EM algorithm. One more approximation step and one more simulation step and we would reach a form of ABC-Gibbs!) In this version, there are very few assumptions made on the approximation sequence, except that it converges with the iteration index to the true distribution (for a fixed observed sample) if convergence of ABC-SAEM is to happen. The paper takes as an illustrative sequence a collection of tempered versions of the true conditionals, but this is quite formal as I cannot fathom a feasible simulation from the tempered version and not from the untempered one. It is thus much more a version of tempered SAEM than truly connected with ABC (although a genuine ABC-EM version could be envisioned).

## Archive for ABC-Gibbs

## ABC-SAEM

Posted in Books, Statistics, University life with tags ABC, ABC-Gibbs, ABC-MCMC, Alan Turing, École Polytechnique, EM, JSM 2015, MAP estimators, MCMC, MCMC-SAEM, Monolix, Paris-Saclay campus, PhD thesis, SAEM, Seattle, simulated annealing, stochastic approximation, University of Warwick, well-tempered algorithm on October 8, 2019 by xi'an## ABC in Clermont-Ferrand

Posted in Mountains, pictures, Statistics, Travel, University life with tags ABC, ABC-Gibbs, Approximate Bayesian computation, Auvergne, Clermont-Ferrand, conditional sufficiency, cosmostats, dimension reduction, Gibbs sampling, likelihood-free methods, PMC, volcano on September 20, 2019 by xi'an**T**oday I am taking part in a one-day workshop at the Université of Clermont Auvergne on ABC. With applications to cosmostatistics, along with Martin Kilbinger [with whom I worked on PMC schemes], Florent Leclerc and Grégoire Aufort. This should prove a most exciting day! (With not enough time to run up Puy de Dôme in the morning, though.)

## likelihood-free approximate Gibbs sampling

Posted in Books, Statistics with tags ABC, ABC-Gibbs, ABC-within-Gibbs, curse of dimensionality, expectation-propagation, Gibbs sampling, local regression, neural network, summary statistics on June 19, 2019 by xi'an

“Low-dimensional regression-based models are constructed for each of these conditional distributions using synthetic (simulated) parameter value and summary statistic pairs, which then permit approximate Gibbs update steps (…) synthetic datasets are not generated during each sampler iteration, thereby providing efficiencies for expensive simulator models, and only require sufficient synthetic datasets to adequately construct the full conditional models (…) Construction of the approximate conditional distributions can exploit known structures of the high-dimensional posterior, where available, to considerably reduce computational overheads”

**G**uilherme Souza Rodrigues, David Nott, and Scott Sisson have just arXived a paper on approximate Gibbs sampling. Since this comes a few days after we posted our own version, here are some of the differences I could spot in the paper:

- Further references to earlier occurrences of Gibbs versions of ABC, esp. in cases when the likelihood function factorises into components and allows for summaries with lower dimensions. And even to ESP.
- More an ABC version of Gibbs sampling that a Gibbs version of ABC in that approximations to the conditionals are first constructed and then used with no further corrections.
- Inherently related to regression post-processing à la Beaumont et al. (2002) in that the regression model is the start to designing an approximate full conditional, conditional on the “other” parameters and on the overall summary statistic. The construction of the approximation is far from automated. And may involve neural networks or other machine learning estimates.
- As a consequence of the above, a preliminary ABC step to design the collection of approximate full conditionals using a single and all-purpose multidimensional summary statistic.
- Once the approximations constructed, no further pseudo-data is generated.
- Drawing from the approximate full conditionals is done exactly, possibly via a bootstrapped version.
- Handling a highly complex g-and-k dynamic model with 13,140 unknown parameters, requiring a ten days simulation.

“In certain circumstances it can be seen that the likelihood-free approximate Gibbs sampler will exactly target the true partial posterior (…) In this case, then Algorithms 2 and 3 will be exact.”

Convergence and coherence are handled in the paper by setting the algorithm(s) as noisy Monte Carlo versions, à la Alquier et al., although the issue of incompatibility between the full conditionals is acknowledged, with the main reference being the finite state space analysis of Chen and Ip (2015). It thus remains unclear whether or not the Gibbs samplers that are implemented there do converge and if they do what is the significance of the stationary distribution.

## A precursor of ABC-Gibbs

Posted in Books, R, Statistics with tags ABC, ABC-Gibbs, compatible conditional distributions, Genetics, Gibbs sampler, high dimensions, incoherent inference, incompatible conditionals, insufficiency, likelihood-free methods, sufficient statistics on June 7, 2019 by xi'an**F**ollowing our arXival of ABC-Gibbs, Dennis Prangle pointed out to us a 2016 paper by Athanasios Kousathanas, Christoph Leuenberger, Jonas Helfer, Mathieu Quinodoz, Matthieu Foll, and Daniel Wegmann, Likelihood-Free Inference in High-Dimensional Model, published in Genetics, Vol. 203, 893–904 in June 2016. This paper contains a version of ABC Gibbs where parameters are sequentially simulated from conditionals that depend on the data only through small dimension conditionally sufficient statistics. I had actually blogged about this paper in 2015 but since then completely forgotten about it. (The comments I had made at the time still hold, already pertaining to the coherence or lack thereof of the sampler. I had also forgotten I had run an experiment of an exact Gibbs sampler with incoherent conditionals, which then seemed to converge to something, if not the exact posterior.)

All ABC algorithms, including ABC-PaSS introduced here, require that statistics are sufficient for estimating the parameters of a given model. As mentioned above, parameter-wise sufficient statistics as required by ABC-PaSS are trivial to find for distributions of the exponential family. Since many population genetics models do not follow such distributions, sufficient statistics are known for the most simple models only. For more realistic models involving multiple populations or population size changes, only approximately-sufficient statistics can be found.

While Gibbs sampling is not mentioned in the paper, this is indeed a form of ABC-Gibbs, with the advantage of not facing convergence issues thanks to the sufficiency. The drawback being that this setting is restricted to exponential families and hence difficult to extrapolate to non-exponential distributions, as using almost-sufficient (or not) summary statistics leads to incompatible conditionals and thus jeopardise the convergence of the sampler. When thinking a wee bit more about the case treated by Kousathanas et al., I am actually uncertain about the validation of the sampler. When tolerance is equal to zero, this is not an issue as it reproduces the regular Gibbs sampler. Otherwise, each conditional ABC step amounts to introducing an auxiliary variable represented by the simulated summary statistic. Since the distribution of this summary statistic depends on more than the parameter for which it is sufficient, in general, it should also appear in the conditional distribution of other parameters. At least from this Gibbs perspective, it thus relies on incompatible conditionals, which makes the conditions proposed in our own paper the more relevant.

## ABC with Gibbs steps

Posted in Statistics with tags ABC, ABC-Gibbs, Approximate Bayesian computation, Bayesian inference, bois de Boulogne, compatible conditional distributions, contraction, convergence, ergodicity, France, Gibbs sampler, hierarchical Bayesian modelling, incompatible conditionals, La Défense, Paris, stationarity, tolerance, Université Paris Dauphine on June 3, 2019 by xi'an**W**ith Grégoire Clarté, Robin Ryder and Julien Stoehr, all from Paris-Dauphine, we have just arXived a paper on the specifics of ABC-Gibbs, which is a version of ABC where the generic ABC accept-reject step is replaced by a sequence of n conditional ABC accept-reject steps, each aiming at an ABC version of a conditional distribution extracted from the joint and intractable target. Hence an ABC version of the standard Gibbs sampler. What makes it so special is that each conditional can (and should) be conditioning on a different statistic in order to decrease the dimension of this statistic, ideally down to the dimension of the corresponding component of the parameter. This successfully bypasses the curse of dimensionality but immediately meets with two difficulties. The first one is that the resulting sequence of conditionals is not coherent, since it is not a Gibbs sampler on the ABC target. The conditionals are thus incompatible and therefore convergence of the associated Markov chain becomes an issue. We produce sufficient conditions for the Gibbs sampler to converge to a stationary distribution using incompatible conditionals. The second problem is then that, provided it exists, the limiting and also intractable distribution does not enjoy a Bayesian interpretation, hence may fail to be justified from an inferential viewpoint. We however succeed in producing a version of ABC-Gibbs in a hierarchical model where the limiting distribution can be explicited and even better can be weighted towards recovering the original target. (At least with limiting zero tolerance.)

## likelihood-free inference in high-dimensional models

Posted in Books, R, Statistics, University life with tags ABC, ABC-Gibbs, compatible conditional distributions, convergence of Gibbs samplers, curse of dimensionality, exact ABC, Gibbs sampling, median, median absolute deviation, R on September 1, 2015 by xi'an

“…for a general linear model (GLM), a single linear function is a sufficient statistic for each associated parameter…”

The recently arXived paper “Likelihood-free inference in high-dimensional models“, by Kousathanas et al. (July 2015), proposes an ABC resolution of the dimensionality curse [when the dimension of the parameter and of the corresponding summary statistics] by turning Gibbs-like and by using a component-by-component ABC-MCMC update that allows for low dimensional statistics. In the (rare) event there exists a conditional sufficient statistic for each component of the parameter vector, the approach is just as justified as when using a generic ABC-Gibbs method based on the whole data. Otherwise, that is, when using a non-sufficient estimator of the corresponding component (as, e.g., in a generalised [not general!] linear model), the approach is less coherent as there is no joint target associated with the Gibbs moves. One may therefore wonder at the convergence properties of the resulting algorithm. The only safe case [in dimension 2] is when one of the restricted conditionals does not depend on the other parameter. Note also that each Gibbs step a priori requires the simulation of a new pseudo-dataset, which may be a major imposition on computing time. And that setting the tolerance for each parameter is a delicate calibration issue because in principle the tolerance should depend on the other component values. Continue reading