Archive for Gibbs sampler

inference with insufficient statistics #2

Posted in Books, Kids, Statistics with tags , , , , , , , on December 28, 2023 by xi'an

Another X validated question of some interest: how to infer about the parameters of a model when only given a fraction 1-α of the order statistics. For instance, the (1-α)n largest observations. On a primary level, the answer is somewhat obvious since the joint density of these observations is available in closed form. At another level, it brings out the fact that the distribution of the unobserved part of the sample given the observed one only depends on the smallest observed order statistic ς (which is thus sufficient in that sense) and ends up being the original distribution truncated at ς, which allows for a closed form EM implementation. Which is also interesting given that the moments of a Normal order statistic are not available in closed form. This reminded me of the insufficient Gibbs paper we wrote with Antoine and Robin a few months ago, except for the available likelihood. And provided fodder for the final exam of my introductory mathematical statistics course at Paris Dauphine.

truncated mixtures

Posted in Books, pictures, R, Statistics with tags , , , , , on May 4, 2022 by xi'an

A question on X validated about EM steps for a truncated Normal mixture led me to ponder whether or not a more ambitious completion [more ambitious than the standard component allocation] was appropriate. Namely, if the mixture is truncated to the interval (a,b), with an observed sample x of size n, this sample could be augmented into an untrucated sample y by latent samples over the complement of (a,b), with random sizes corresponding to the probabilities of falling within (-∞,a), (a,b), and (b,∞). In other words, y is made of three parts, including x, with sizes N¹, n, N³, respectively, the vector (N¹, n, N³) being a trinomial M(N⁺,p) random variable and N⁺ an extra unknown in the model. Assuming a (pseudo-) conjugate prior, an approximate Gibbs sampler can be run (by ignoring the dependence of p on the mixture parameters!). I did not go as far as implementing the idea for the mixture, but had a quick try for a simple truncated Normal. And did not spot any explosive behaviour in N⁺, which is what I was worried about.  Of course, this is mostly anecdotal since the completion does not bring a significant improvement in coding or convergence (the plots corresponds to 10⁴ simulations, for a sample of size n=400).

efficiency of normalising over discrete parameters

Posted in Statistics with tags , , , , , , , , , on May 1, 2022 by xi'an

Yesterday, I noticed a new arXival entitled Investigating the efficiency of marginalising over discrete parameters in Bayesian computations written by Wen Wang and coauthors. The paper is actually comparing the simulation of a Gibbs sampler with an Hamiltonian Monte Carlo approach on Gaussian mixtures, when including and excluding latent variables, respectively. The authors missed the opposite marginalisation when the parameters are integrated.

While marginalisation requires substantial mathematical effort, folk wisdom in the Stan community suggests that fitting models with marginalisation is more efficient than using Gibbs sampling.

The comparison is purely experimental, though, which means it depends on the simulated data, the sample size, the prior selection, and of course the chosen algorithms. It also involves the [mostly] automated [off-the-shelf] choices made in the adopted software, JAGS and Stan. The outcome is only evaluated through ESS and the (old) R statistic. Which all depend on the parameterisation. But evacuates the label switching problem by imposing an ordering on the Gaussian means, which may have a different impact on marginalised and unmarginalised models. All in all, there is not much one can conclude about this experiment since the parameter values beyond the simulated data seem to impact the performances much more than the type of algorithm one implements.

mixed feelings

Posted in Books, Kids, Statistics with tags , , , , on September 9, 2021 by xi'an

Two recent questions on X validated about mixtures:

  1. One on the potential negative explosion of the E function in the EM algorithm for a mixture of components with different supports:  “I was hoping to use the EM algorithm to fit a mixture model in which the mixture components can have differing support. I’ve run into a problem during the M step because the expected log-likelihood can be [minus] infinite” Which mistake is based on a confusion between the current parameter estimate and the free parameter to optimise.
  2. Another one on the Gibbs sampler apparently failing for a two-component mixture with only the weights unknown, when the components are close to one another:  “The algorithm works fine if σ is far from 1 but it does not work anymore for σ close to 1.” Which did not see a wide posterior as a possible posterior when both components are similar and hence delicate to distinguish from one another.

 

invertible flow non equilibrium sampling (InFiNE)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 21, 2021 by xi'an

With Achille Thin and a few other coauthors [and friends], we just arXived a paper on a new form of importance sampling, motivated by a recent paper of Rotskoff and Vanden-Eijnden (2019) on non-equilibrium importance sampling. The central ideas of this earlier paper are the introduction of conformal Hamiltonian dynamics, where a dissipative term is added to the ODE found in HMC, namely

\dfrac{\text d p_t}{\text dt}=-\dfrac{\partial}{\partial q}H(q_t,p_t)-\gamma p_t=-\nabla U(q_t)-\gamma p_t

which means that all orbits converge to fixed points that satisfy ∇U(q) = 0 as the energy eventually vanishes. And the property that, were T be a conformal Hamiltonian integrator associated with H, i.e. perserving the invariant measure, averaging over orbits of T would improve the precision of Monte Carlo unbiased estimators, while remaining unbiased. The fact that Rotskoff and Vanden-Eijnden (2019) considered only continuous time makes their proposal hard to implement without adding approximation error, while our approach is directly set in discrete-time and preserves unbiasedness. And since measure preserving transforms are too difficult to come by, a change of variable correction, as in normalising flows, allows for an arbitrary choice of T, while keeping the estimator unbiased. The use of conformal maps makes for a natural choice of T in this context.

The resulting InFiNE algorithm is an MCMC particular algorithm which can be represented as a  partially collapsed Gibbs sampler when using the right auxiliary variables. As in Andrieu, Doucet and Hollenstein (2010) and their ISIR algorithm. The algorithm can be used for estimating normalising constants, comparing favourably with AIS, sampling from complex targets, and optimising variational autoencoders and their ELBO.

I really appreciated working on this project, with links to earlier notions like multiple importance sampling à la Owen and Zhou (2000), nested sampling, non-homogeneous normalising flows, measure estimation à la Kong et al. (2002), on which I worked in a more or less distant past.