## Advancements in Bayesian Methods and Implementations [to appear]

Posted in Books, Statistics with tags , , , , , on July 17, 2022 by xi'an

As noted in another post, I wrote a chapter on Bayesian testing for an incoming handbook, Advancements in Bayesian methods and implementations which is published by Elsevier at an atrocious price (as usual). Here is the table of contents:

1. Fisher Information, Cramèr-Rao and Bayesian Paradigm by Roy Frieden
2. Compound beta binomial distribution functions by Angelo Plastino
3. MCMC for GLMMS by Vivekananda Roy
4. Signal Processing and Bayesian by Chandra Murthy
5. Mathematical theory of Bayesian statistics where all models are wrong by Sumio Watanabe
6. Machine Learning and Bayesian by Jun Zhu
7. Non-parametric Bayes by Stephen Walker
8. [50 shades of] Bayesian testing [of hypotheses] by Christian P. Robert
9. Data Analysis with humans by Sumio Kaski
10. Bayesian Inference under selection by G. Alastair Young
10. Variational inference or Functional horseshoe by Anirban Bhattacharya
11. Generalized Bayes by Ryan Martin

and my chapter is also available on arXiv, quickly gathered from earlier short courses at O’Bayes meetings and some xianblog entries on the topic, hence not containing much novelty!

## Bayesian restricted likelihood with insufficient statistic [slides]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , on February 9, 2022 by xi'an

A great Bayesian Analysis webinar this afternoon with well-balanced presentations by Steve MacEachern and John Lewis, and original discussions by Bertrand Clarke and Fabrizio Rugieri. Which attracted 122 participants. I particularly enjoyed Bertrand’s points that likelihoods were more general than models [made in 6 different wordings!] and that this paper was closer to the M-open perspective. I think I eventually got the reason why the approach could be seen as an ABC with ε=0, since the simulated y’s all get the right statistic, but this presentation does not bring a strong argument in favour of the restricted likelihood approach, when considering the methodological and computational effort. The discussion also made me wonder if tools like VAEs could be used towards approximating the distribution of T(y) conditional on the parameter θ. This is also an opportunity to thank my friend Michele Guindani for his hard work as Editor of Bayesian Analysis and in particular for keeping the discussion tradition thriving!

## BA webinar with discussion

Posted in Statistics with tags , , , , , , , , on February 8, 2022 by xi'an

## distributed evidence

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on December 16, 2021 by xi'an

Alexander Buchholz (who did his PhD at CREST with Nicolas Chopin), Daniel Ahfock, and my friend Sylvia Richardson published a great paper on the distributed computation of Bayesian evidence in Bayesian Analysis. The setting is one of distributed data from several sources with no communication between them, which relates to consensus Monte Carlo even though model choice has not been particularly studied from that perspective. The authors operate under the assumption of conditionally conjugate models, i.e., the existence of a data augmentation scheme into an exponential family so that conjugate priors can be used. For a division of the data into S blocks, the fundamental identity in the paper is

$p(y) = \alpha^S \prod_{s=1}^S \tilde p(y_s) \int \prod_{s=1}^S \tilde p(\theta|y_s)\,\text d\theta$

where α is the normalising constant of the sub-prior exp{log[p(θ)]/S} and the other terms are associated with this prior. Under the conditionally conjugate assumption, the integral can be approximated based on the latent variables. Most interestingly, the associated variance is directly connected with the variance of

$p(z_{1:S}|y)\Big/\prod_{s=1}^S \tilde p(z_s|y_s)$

under the joint:

“The variance of the ratio measures the quality of the product of the conditional sub-posterior as an importance sample proposal distribution.”

Assuming this variance is finite (which is likely). An approximate alternative is proposed, namely to replace the exact sub-posterior with a Normal distribution, as in consensus Monte Carlo, which should obviously require some consideration as to which parameterisation of the model produces the “most normal” (or the least abnormal!) posterior. And ensures a finite variance in the importance sampling approximation (as ensured by the strong bounds in Proposition 5). A problem shared by the bridgesampling package.

“…if the error that comes from MCMC sampling is relatively small and that the shard sizes are large enough so that the quality of the subposterior normal approximation is reasonable, our suggested approach will result in good approximations of the full data set marginal likelihood.”

The resulting approximation can also be handy in conjunction with reversible jump MCMC, in the sense that RJMCMC algorithms can be run in parallel on different chunks or shards of the entire dataset. Although the computing gain may be reduced by the need for separate approximations.

## finding our way in the dark

Posted in Books, pictures, Statistics with tags , , , , , , , , , on November 18, 2021 by xi'an

The paper Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods by Evgeny Levi and (my friend) Radu Craiu, recently got published in Bayesian Analysis. The central motivation for their work is that both ABC and synthetic likelihood are costly methods when the data is large and does not allow for smaller summaries. That is, when summaries S of smaller dimension cannot be directly simulated. The idea is to try to estimate

$h(\theta)=\mathbb{P}_\theta(d(S,S^\text{obs})\le\epsilon)$

since this is the substitute for the likelihood used for ABC. (A related idea is to build an approximate and conditional [on θ] distribution on the distance, idea with which Doc. Stoehr and I played a wee bit without getting anything definitely interesting!) This is a one-dimensional object, hence non-parametric estimates could be considered… For instance using k-nearest neighbour methods (which were already linked with ABC by Gérard Biau and co-authors.) A random forest could also be used (?). Or neural nets. The method still requires a full simulation of new datasets, so I wonder at the gain unless the replacement of the naïve indicator with h(θ) brings clear improvement to the approximation. Hence much fewer simulations. The ESS reduction is definitely improved, esp. since the CPU cost is higher. Could this be associated with the recourse to independent proposals?

In a sence, Bayesian synthetic likelihood does not convey the same appeal, since is a bit more of a tough cookie: approximating the mean and variance is multidimensional. (BSL is always more expensive!)

As a side remark, the authors use two chains in parallel to simplify convergence proofs, as we did a while ago with AMIS!