## continuous herded Gibbs sampling

Posted in Books, pictures, Statistics with tags , , , , , , , , on June 28, 2021 by xi'an

Read a short paper by Laura Wolf and Marcus Baum on Gibbs herding, where herding is a technique of “deterministic sampling”, for instance selecting points over the support of the distribution by matching exact and empirical (or “empirical”!) moments. Which reminds me of the principal points devised by my late friend Bernhard Flury. With an unclear argument as to why it could take over random sampling:

“random numbers are often generated by pseudo-random number generators, hence are not truly random”

Especially since the aim is to “draw samples from continuous multivariate probability densities.” The sequential construction of such a sample proceeds sequentially by adding a new (T+1)-th point to the existing sample of y’s by maximising in x the discrepancy

$(T+1)\mathbb E^Y[k(x,Y)]-\sum_{t=1}^T k(x,y_t)$

where k(·,·) is a kernel, e.g. a Gaussian density. Hence a complexity that grows as O(T). The current paper suggests using Gibbs “sampling” to update one component of x at a time. Using the conditional version of the above discrepancy. Making the complexity grow as O(dT) in d dimensions.

I remain puzzled by the whole thing as these samples cannot be used as regular random or quasi-random samples. And in particular do not produce unbiased estimators of anything. Obviously. The production of such samples being furthermore computationally costly it is also unclear to me that they could even be used for quick & dirty approximations of a target sample.

## EM degeneracy

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on June 16, 2021 by xi'an

At the MHC 2021 conference today (to which I biked to attend for real!, first time since BayesComp!) I listened to Christophe Biernacki exposing the dangers of EM applied to mixtures in the presence of missing data, namely that the algorithm has a rising probability to reach a degenerate solution, namely a single observation component. Rising in the proportion of missing data. This is not hugely surprising as there is a real (global) mode at this solution. If one observation components are prohibited, they should not be accepted in the EM update. Just as in Bayesian analyses with improper priors, the likelihood should bar single or double  observations components… Which of course makes EM harder to implement. Or not?! MCEM, SEM and Gibbs are obviously straightforward to modify in this case.

Judith Rousseau also gave a fascinating talk on the properties of non-parametric mixtures, from a surprisingly light set of conditions for identifiability to posterior consistency . With an interesting use of several priors simultaneously that is a particular case of the cut models. Namely a correct joint distribution that cannot be a posterior, although this does not impact simulation issues. And a nice trick turning a hidden Markov chain into a fully finite hidden Markov chain as it is sufficient to recover a Bernstein von Mises asymptotic. If inefficient. Sylvain LeCorff presented a pseudo-marginal sequential sampler for smoothing, when the transition densities are replaced by unbiased estimators. With connection with approximate Bayesian computation smoothing. This proves harder than I first imagined because of the backward-sampling operations…

## latent variables for a hierarchical Poisson model

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , on March 11, 2021 by xi'an

Answering a question on X validated about a rather standard hierarchical Poisson model, and its posterior Gibbs simulation, where observations are (d and w being a document and a word index, resp.)

$N_{w,d}\sim\mathcal P(\textstyle\sum_{1\le k\le K} \pi_{k,d}\varphi_{k,w})\qquad(1)$

I found myself dragged into an extended discussion on the validation of creating independent Poisson latent variables

$N_{k,w,d}\sim\mathcal P(\pi_{k,d}\varphi_{k,w})\qquad(2)$

since observing their sum in (1) was preventing the latent variables (2) from being independent. And then found out that the originator of the question had asked on X validated an unanswered and much more detailed question in 2016, even though the notations differ. The question does contain the solution I proposed above, including the Multinomial distribution on the Poisson latent variables given their sum (and the true parameters). As it should be since the derivation was done in a linked 2014 paper by Gopalan, Hofman, and Blei, later published in the Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI). I am thus bemused at the question resurfacing five years later in a much simplified version, but still exhibiting the same difficulty with the conditioning principles…

## freedom prior

Posted in Books, Kids, Statistics with tags , , , , , on December 9, 2020 by xi'an

Another X validated question on which I spent more time than expected. Because of the somewhat unusual parameterisation used in BDA.for the inverse χ² distribution. The interest behind the question is in the induced distribution on the parameter associated with the degrees of freedom ν of the t-distribution (question that coincided with my last modifications of my undergraduate mathematical statistics exam, involving a t sample). Whichever the prior chosen on ν, the posterior involves a nasty term

$\pi(\nu)\frac{(\nu)^{n\nu/2}}{\Gamma(\nu/2)^n}{\,(v_1\cdots v_n)^{-\nu/2-1}\exp\Big\{-\nu\sigma^2}\sum_{i=1}^n1\big/2v_i\Big\}$

as the Gamma function there is quickly explosive (as can be checked Stirling’s formula). Unless the prior π(ν) cancels this term, which is rather fishy as the prior would then depend on the sample size n. Even though the whole posterior is well-defined (and hence non-explosive). Rather than seeking a special prior π(ν) for computation purposes, I would thus favour a modelling restricted to integer valued ν’s as there is not much motivation in inferring about non-integer degrees of freedom.

## my talk in Newcastle

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on November 13, 2020 by xi'an

I will be talking (or rather zooming) at the statistics seminar at the University of Newcastle this afternoon on the paper Component-wise approximate Bayesian computation via Gibbs-like steps that just got accepted by Biometrika (yay!). Sadly not been there for real, as I would have definitely enjoyed reuniting with friends and visiting again this multi-layered city after discovering it for the RSS meeting of 2013, which I attended along with Jim Hobert and where I re-discussed the re-Read DIC paper. Before traveling south to Warwick to start my new appointment there. (I started with a picture of Seoul taken from the slopes of Gwanaksan about a year ago as a reminder of how much had happened or failed to happen over the past year…Writing 2019 as the year was unintentional but reflected as well on the distortion of time induced by the lockdowns!)