Archive for conjugate priors

distributed evidence

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on December 16, 2021 by xi'an

Alexander Buchholz (who did his PhD at CREST with Nicolas Chopin), Daniel Ahfock, and my friend Sylvia Richardson published a great paper on the distributed computation of Bayesian evidence in Bayesian Analysis. The setting is one of distributed data from several sources with no communication between them, which relates to consensus Monte Carlo even though model choice has not been particularly studied from that perspective. The authors operate under the assumption of conditionally conjugate models, i.e., the existence of a data augmentation scheme into an exponential family so that conjugate priors can be used. For a division of the data into S blocks, the fundamental identity in the paper is

p(y) = \alpha^S \prod_{s=1}^S \tilde p(y_s) \int \prod_{s=1}^S \tilde p(\theta|y_s)\,\text d\theta

where α is the normalising constant of the sub-prior exp{log[p(θ)]/S} and the other terms are associated with this prior. Under the conditionally conjugate assumption, the integral can be approximated based on the latent variables. Most interestingly, the associated variance is directly connected with the variance of

p(z_{1:S}|y)\Big/\prod_{s=1}^S \tilde p(z_s|y_s)

under the joint:

“The variance of the ratio measures the quality of the product of the conditional sub-posterior as an importance sample proposal distribution.”

Assuming this variance is finite (which is likely). An approximate alternative is proposed, namely to replace the exact sub-posterior with a Normal distribution, as in consensus Monte Carlo, which should obviously require some consideration as to which parameterisation of the model produces the “most normal” (or the least abnormal!) posterior. And ensures a finite variance in the importance sampling approximation (as ensured by the strong bounds in Proposition 5). A problem shared by the bridgesampling package.

“…if the error that comes from MCMC sampling is relatively small and that the shard sizes are large enough so that the quality of the subposterior normal approximation is reasonable, our suggested approach will result in good approximations of the full data set marginal likelihood.”

The resulting approximation can also be handy in conjunction with reversible jump MCMC, in the sense that RJMCMC algorithms can be run in parallel on different chunks or shards of the entire dataset. Although the computing gain may be reduced by the need for separate approximations.

conjugate priors and sufficient statistics

Posted in Statistics with tags , , , , , on March 29, 2021 by xi'an

An X validated question rekindled my interest in the connection between sufficiency and conjugacy, by asking whether or not there was an equivalence between the existence of a (finite dimension) conjugate family of priors and the existence of a fixed (in n, the sample size) dimension sufficient statistic. Outside exponential families, meaning that the support of the sampling distribution need vary with the parameter.

While the existence of a sufficient statistic T of fixed dimension d whatever the (large enough) sample size n seems to clearly imply the existence of a (finite dimension) conjugate family of priors, or rather of a family associated with each possible dominating (prior) measure,

\mathfrak F=\{ \tilde \pi(\theta)\propto \tilde {f_n}(t_n(x_{1:n})|\theta) \pi_0(\theta)\,;\ n\in \mathbb N, x_{1:n}\in\mathfrak X^n\}

the reverse statement is a wee bit more delicate to prove, due to the varying supports of the sampling or prior distributions. Unless some conjugate prior in the assumed family has an unrestricted support, the argument seems to limit sufficiency to a particular subset of the parameter set. I think that the result remains correct in general but could not rigorously wrap up the proof

conjugate of a binomial

Posted in Statistics with tags , , , , , , on March 25, 2021 by xi'an

latent variables for a hierarchical Poisson model

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , on March 11, 2021 by xi'an

Answering a question on X validated about a rather standard hierarchical Poisson model, and its posterior Gibbs simulation, where observations are (d and w being a document and a word index, resp.)

N_{w,d}\sim\mathcal P(\textstyle\sum_{1\le k\le K} \pi_{k,d}\varphi_{k,w})\qquad(1)

I found myself dragged into an extended discussion on the validation of creating independent Poisson latent variables

N_{k,w,d}\sim\mathcal P(\pi_{k,d}\varphi_{k,w})\qquad(2)

since observing their sum in (1) was preventing the latent variables (2) from being independent. And then found out that the originator of the question had asked on X validated an unanswered and much more detailed question in 2016, even though the notations differ. The question does contain the solution I proposed above, including the Multinomial distribution on the Poisson latent variables given their sum (and the true parameters). As it should be since the derivation was done in a linked 2014 paper by Gopalan, Hofman, and Blei, later published in the Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI). I am thus bemused at the question resurfacing five years later in a much simplified version, but still exhibiting the same difficulty with the conditioning principles…

probability that a vaccinated person is shielded from COVID-19?

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , on March 10, 2021 by xi'an

Over my flight to Montpellier last week, I read an arXival on a Bayesian analysis of the vaccine efficiency. Whose full title is “What is the probability that a vaccinated person is shielded from Covid-19? A Bayesian MCMC based reanalysis of published data with emphasis on what should be reported as `efficacy'”, by Giulio D’Agostini and Alfredo Esposito. In short I was not particularly impressed.

“But the real point we wish to highlight, given the spread of distributions, is that we do not have enough data for drawing sound conclusion.”

The reason for this lack of enthusiasm on my side is that, while the authors’ criticism of an excessive precision in Pfizer, Moderna, or AstraZeneca press releases is appropriate, given the published confidence intervals are not claiming the same precision, a Bayesian reanalysis of the published outcome of their respective vaccine trial outcomes does not show much, simply because there is awfully little data, essentially two to four Binomial-like outcomes. Without further data, the modelling is one of a simple graph of Binomial observations, with two or three probability parameters, which results in a very standard Bayesian analysis that does depend on the modelling choices being made, from a highly unrealistic assumption of homogeneity throughout the population(s) tested for the vaccine(s), to a lack of hyperparameters that could have been shared between vaccinated populations. Parts of the arXival are unrelated and unnecessary, like the highly detailed MCMC algorithm for simulating the posterior (incl. JAGS code) to the reminiscence of Bayes’ and Laplace’s early rendering of inverse probability. (I find both interesting and revealing that arXiv, just like medRxiv, posts a warning on top of COVID related preprints.)

%d bloggers like this: