Archive for compatible conditional distributions

EM degeneracy

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on June 16, 2021 by xi'an

At the MHC 2021 conference today (to which I biked to attend for real!, first time since BayesComp!) I listened to Christophe Biernacki exposing the dangers of EM applied to mixtures in the presence of missing data, namely that the algorithm has a rising probability to reach a degenerate solution, namely a single observation component. Rising in the proportion of missing data. This is not hugely surprising as there is a real (global) mode at this solution. If one observation components are prohibited, they should not be accepted in the EM update. Just as in Bayesian analyses with improper priors, the likelihood should bar single or double  observations components… Which of course makes EM harder to implement. Or not?! MCEM, SEM and Gibbs are obviously straightforward to modify in this case.

Judith Rousseau also gave a fascinating talk on the properties of non-parametric mixtures, from a surprisingly light set of conditions for identifiability to posterior consistency . With an interesting use of several priors simultaneously that is a particular case of the cut models. Namely a correct joint distribution that cannot be a posterior, although this does not impact simulation issues. And a nice trick turning a hidden Markov chain into a fully finite hidden Markov chain as it is sufficient to recover a Bernstein von Mises asymptotic. If inefficient. Sylvain LeCorff presented a pseudo-marginal sequential sampler for smoothing, when the transition densities are replaced by unbiased estimators. With connection with approximate Bayesian computation smoothing. This proves harder than I first imagined because of the backward-sampling operations…

too many marginals

Posted in Kids, Statistics with tags , , , , , , , on February 3, 2020 by xi'an

This week, the CEREMADE coffee room puzzle was about finding a joint distribution for (X,Y) such that (marginally) X and Y are both U(0,1), while X+Y is U(½,1+½). Beyond the peculiarity of the question, there is a larger scale problem, as to how many (if any) compatible marginals h¹(X,Y), h²(X,Y), h³(X,Y), …, need one constrains the distribution to reconstruct the joint. And wondering if any Gibbs-like scheme is available to simulate the joint.

A precursor of ABC-Gibbs

Posted in Books, R, Statistics with tags , , , , , , , , , , on June 7, 2019 by xi'an

Following our arXival of ABC-Gibbs, Dennis Prangle pointed out to us a 2016 paper by Athanasios Kousathanas, Christoph Leuenberger, Jonas Helfer, Mathieu Quinodoz, Matthieu Foll, and Daniel Wegmann, Likelihood-Free Inference in High-Dimensional Model, published in Genetics, Vol. 203, 893–904 in June 2016. This paper contains a version of ABC Gibbs where parameters are sequentially simulated from conditionals that depend on the data only through small dimension conditionally sufficient statistics. I had actually blogged about this paper in 2015 but since then completely forgotten about it. (The comments I had made at the time still hold, already pertaining to the coherence or lack thereof of the sampler. I had also forgotten I had run an experiment of an exact Gibbs sampler with incoherent conditionals, which then seemed to converge to something, if not the exact posterior.)

All ABC algorithms, including ABC-PaSS introduced here, require that statistics are sufficient for estimating the parameters of a given model. As mentioned above, parameter-wise sufficient statistics as required by ABC-PaSS are trivial to find for distributions of the exponential family. Since many population genetics models do not follow such distributions, sufficient statistics are known for the most simple models only. For more realistic models involving multiple populations or population size changes, only approximately-sufficient statistics can be found.

While Gibbs sampling is not mentioned in the paper, this is indeed a form of ABC-Gibbs, with the advantage of not facing convergence issues thanks to the sufficiency. The drawback being that this setting is restricted to exponential families and hence difficult to extrapolate to non-exponential distributions, as using almost-sufficient (or not) summary statistics leads to incompatible conditionals and thus jeopardise the convergence of the sampler. When thinking a wee bit more about the case treated by Kousathanas et al., I am actually uncertain about the validation of the sampler. When tolerance is equal to zero, this is not an issue as it reproduces the regular Gibbs sampler. Otherwise, each conditional ABC step amounts to introducing an auxiliary variable represented by the simulated summary statistic. Since the distribution of this summary statistic depends on more than the parameter for which it is sufficient, in general, it should also appear in the conditional distribution of other parameters. At least from this Gibbs perspective, it thus relies on incompatible conditionals, which makes the conditions proposed in our own paper the more relevant.

ABC with Gibbs steps

Posted in Statistics with tags , , , , , , , , , , , , , , , , , on June 3, 2019 by xi'an

With Grégoire Clarté, Robin Ryder and Julien Stoehr, all from Paris-Dauphine, we have just arXived a paper on the specifics of ABC-Gibbs, which is a version of ABC where the generic ABC accept-reject step is replaced by a sequence of n conditional ABC accept-reject steps, each aiming at an ABC version of a conditional distribution extracted from the joint and intractable target. Hence an ABC version of the standard Gibbs sampler. What makes it so special is that each conditional can (and should) be conditioning on a different statistic in order to decrease the dimension of this statistic, ideally down to the dimension of the corresponding component of the parameter. This successfully bypasses the curse of dimensionality but immediately meets with two difficulties. The first one is that the resulting sequence of conditionals is not coherent, since it is not a Gibbs sampler on the ABC target. The conditionals are thus incompatible and therefore convergence of the associated Markov chain becomes an issue. We produce sufficient conditions for the Gibbs sampler to converge to a stationary distribution using incompatible conditionals. The second problem is then that, provided it exists, the limiting and also intractable distribution does not enjoy a Bayesian interpretation, hence may fail to be justified from an inferential viewpoint. We however succeed in producing a version of ABC-Gibbs in a hierarchical model where the limiting distribution can be explicited and even better can be weighted towards recovering the original target. (At least with limiting zero tolerance.)

“more Bayesian” GANs

Posted in Books, Statistics with tags , , , , on December 21, 2018 by xi'an
On X validated, I got pointed to this recent paper by He, Wang, Lee and Tiang, that proposes a new form of Bayesian GAN. Although I do not see it as really Bayesian, as explained below.
“[The] existing Bayesian method (Saatchi & Wilson, 2017) may lead to incompatible conditionals, which suggest that the underlying joint distribution actually does not exist.”
The difference with the Bayesian GANs of Saatchi & Wilson (2017) [with Saatchi’s name being consistently misspelled throughout] is in the definition of the likelihood function, function of both generative and discriminatory parameters. As in Bissiri et al. (2013), the likelihood is replaced by the exponentiated loss function, or rather functions, which are computed with expected or pluggin distributions or discriminating functions. Expectations under the respective priors and for the observed data (?). Meaning there are “two likelihoods” for the same data, one being the inverse of the other in the minimax GAN case. Further, the prior on the generative parameter is actually of the prior feedback category:  at each iteration, the authors “use the generator distribution in the previous time step as a prior for the next time step”. Which makes me wonder how they avoid ending up with a Dirac “prior”. (Even curiouser, the prior on the discriminating parameter, which is not a genuine component of the statistical model, is a flat prior across iterations.) The convergence result established in the paper is that, if the (true) data-generating model can be written as the marginal of the assumed parametric generative model against an “optimal” distribution, then the discriminating function converges to non-discrimination between (true) data-generating model and the assumed parametric generative model. This somehow negates the Bayesian side of the affair, as convergence to a point mass does not produce a Bayesian outcome on the parameters of the model, or on the comparison between true and assumed models. The paper also demonstrates the incompatibility of the two conditionals used by Saatchi & Wilson (2017) and provides a formal example [missing any form of data?] where the associated Bayesian GAN does not converge to the true value behind the model. But the issue is more complex in my opinion in that using two incompatible conditionals does not mean that the associated Markov chain is transient (as e.g. on a compact space).