A question on X validated about EM steps for a truncated Normal mixture led me to ponder whether or not a more ambitious completion [more ambitious than the standard component allocation] was appropriate. Namely, if the mixture is truncated to the interval (a,b), with an observed sample x of size n, this sample could be augmented into an untrucated sample y by latent samples over the complement of (a,b), with random sizes corresponding to the probabilities of falling within (-∞,a), (a,b), and (b,∞). In other words, y is made of three parts, including x, with sizes N¹, n, N³, respectively, the vector (N¹, n, N³) being a trinomial M(N⁺,p) random variable and N⁺ an extra unknown in the model. Assuming a (pseudo-) conjugate prior, an approximate Gibbs sampler can be run (by ignoring the dependence of p on the mixture parameters!). I did not go as far as implementing the idea for the mixture, but had a quick try for a simple truncated Normal. And did not spot any explosive behaviour in N⁺, which is what I was worried about. Of course, this is mostly anecdotal since the completion does not bring a significant improvement in coding or convergence (the plots corresponds to 10⁴ simulations, for a sample of size n=400).
Archive for Gibbs sampler
truncated mixtures
Posted in Books, pictures, R, Statistics with tags completion, cross validated, EM, expectation maximisation, Gibbs sampler, R on May 4, 2022 by xi'anefficiency of normalising over discrete parameters
Posted in Statistics with tags arXiv, Gibbs sampler, Hamiltonian Monte Carlo, JAGS, latent variable models, marginalisation, MCMC, mixtures of distributions, Monte Carlo experiment, STAN on May 1, 2022 by xi'anYesterday, I noticed a new arXival entitled Investigating the efficiency of marginalising over discrete parameters in Bayesian computations written by Wen Wang and coauthors. The paper is actually comparing the simulation of a Gibbs sampler with an Hamiltonian Monte Carlo approach on Gaussian mixtures, when including and excluding latent variables, respectively. The authors missed the opposite marginalisation when the parameters are integrated.
While marginalisation requires substantial mathematical effort, folk wisdom in the Stan community suggests that fitting models with marginalisation is more efficient than using Gibbs sampling.
The comparison is purely experimental, though, which means it depends on the simulated data, the sample size, the prior selection, and of course the chosen algorithms. It also involves the [mostly] automated [off-the-shelf] choices made in the adopted software, JAGS and Stan. The outcome is only evaluated through ESS and the (old) R statistic. Which all depend on the parameterisation. But evacuates the label switching problem by imposing an ordering on the Gaussian means, which may have a different impact on marginalised and unmarginalised models. All in all, there is not much one can conclude about this experiment since the parameter values beyond the simulated data seem to impact the performances much more than the type of algorithm one implements.
mixed feelings
Posted in Books, Kids, Statistics with tags cross validated, disjoint support, EM algorithm, Gibbs sampler, mixtures of distributions on September 9, 2021 by xi'anTwo recent questions on X validated about mixtures:
- One on the potential negative explosion of the E function in the EM algorithm for a mixture of components with different supports: “I was hoping to use the EM algorithm to fit a mixture model in which the mixture components can have differing support. I’ve run into a problem during the M step because the expected log-likelihood can be [minus] infinite” Which mistake is based on a confusion between the current parameter estimate and the free parameter to optimise.
- Another one on the Gibbs sampler apparently failing for a two-component mixture with only the weights unknown, when the components are close to one another: “The algorithm works fine if σ is far from 1 but it does not work anymore for σ close to 1.” Which did not see a wide posterior as a possible posterior when both components are similar and hence delicate to distinguish from one another.
invertible flow non equilibrium sampling (InFiNE)
Posted in Books, Statistics, University life with tags auxiliary variable, conformal Hamiltonian dynamics, energy, Gibbs sampler, Hamiltonian Monte Carlo, HMC, MCMC, multiple importance sampling, multiple mixtures, nested sampling, ODE, particle MCMC, unbiasedness, variational autoencoders on May 21, 2021 by xi'anWith Achille Thin and a few other coauthors [and friends], we just arXived a paper on a new form of importance sampling, motivated by a recent paper of Rotskoff and Vanden-Eijnden (2019) on non-equilibrium importance sampling. The central ideas of this earlier paper are the introduction of conformal Hamiltonian dynamics, where a dissipative term is added to the ODE found in HMC, namely
which means that all orbits converge to fixed points that satisfy ∇U(q) = 0 as the energy eventually vanishes. And the property that, were T be a conformal Hamiltonian integrator associated with H, i.e. perserving the invariant measure, averaging over orbits of T would improve the precision of Monte Carlo unbiased estimators, while remaining unbiased. The fact that Rotskoff and Vanden-Eijnden (2019) considered only continuous time makes their proposal hard to implement without adding approximation error, while our approach is directly set in discrete-time and preserves unbiasedness. And since measure preserving transforms are too difficult to come by, a change of variable correction, as in normalising flows, allows for an arbitrary choice of T, while keeping the estimator unbiased. The use of conformal maps makes for a natural choice of T in this context.
The resulting InFiNE algorithm is an MCMC particular algorithm which can be represented as a partially collapsed Gibbs sampler when using the right auxiliary variables. As in Andrieu, Doucet and Hollenstein (2010) and their ISIR algorithm. The algorithm can be used for estimating normalising constants, comparing favourably with AIS, sampling from complex targets, and optimising variational autoencoders and their ELBO.
I really appreciated working on this project, with links to earlier notions like multiple importance sampling à la Owen and Zhou (2000), nested sampling, non-homogeneous normalising flows, measure estimation à la Kong et al. (2002), on which I worked in a more or less distant past.