## distributed evidence

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on December 16, 2021 by xi'an

Alexander Buchholz (who did his PhD at CREST with Nicolas Chopin), Daniel Ahfock, and my friend Sylvia Richardson published a great paper on the distributed computation of Bayesian evidence in Bayesian Analysis. The setting is one of distributed data from several sources with no communication between them, which relates to consensus Monte Carlo even though model choice has not been particularly studied from that perspective. The authors operate under the assumption of conditionally conjugate models, i.e., the existence of a data augmentation scheme into an exponential family so that conjugate priors can be used. For a division of the data into S blocks, the fundamental identity in the paper is

$p(y) = \alpha^S \prod_{s=1}^S \tilde p(y_s) \int \prod_{s=1}^S \tilde p(\theta|y_s)\,\text d\theta$

where α is the normalising constant of the sub-prior exp{log[p(θ)]/S} and the other terms are associated with this prior. Under the conditionally conjugate assumption, the integral can be approximated based on the latent variables. Most interestingly, the associated variance is directly connected with the variance of

$p(z_{1:S}|y)\Big/\prod_{s=1}^S \tilde p(z_s|y_s)$

under the joint:

“The variance of the ratio measures the quality of the product of the conditional sub-posterior as an importance sample proposal distribution.”

Assuming this variance is finite (which is likely). An approximate alternative is proposed, namely to replace the exact sub-posterior with a Normal distribution, as in consensus Monte Carlo, which should obviously require some consideration as to which parameterisation of the model produces the “most normal” (or the least abnormal!) posterior. And ensures a finite variance in the importance sampling approximation (as ensured by the strong bounds in Proposition 5). A problem shared by the bridgesampling package.

“…if the error that comes from MCMC sampling is relatively small and that the shard sizes are large enough so that the quality of the subposterior normal approximation is reasonable, our suggested approach will result in good approximations of the full data set marginal likelihood.”

The resulting approximation can also be handy in conjunction with reversible jump MCMC, in the sense that RJMCMC algorithms can be run in parallel on different chunks or shards of the entire dataset. Although the computing gain may be reduced by the need for separate approximations.

## Savage Award session today at JSM

Posted in Kids, Statistics, Travel, University life with tags , , , , , , , , , , on August 3, 2020 by xi'an

Pleased to broadcast the JSM session dedicated to the 2020 Savage Award, taking place today at 13:00 ET (17:00 GMT), with two of the Savage nominees being former OxWaSP students (and Warwick PhD students). For those who have not registered for JSM, the talks are also available on Bayeslab. (As it happens, I was also a member of the committee this year, but do not think this could be deemed a CoI!)

 112 Mon, 8/3/2020, 1:00 PM – 2:50 PM Virtual Savage Award Session — Invited Papers International Society for Bayesian Analysis (ISBA) Organizer(s): Maria De Iorio, University College London Chair(s): Maria De Iorio, University College London 1:05 PM Bayesian Dynamic Modeling and Forecasting of Count Time Series Lindsay Berry, Berry Consultants 1:30 PM Machine Learning Using Approximate Inference: Variational and Sequential Monte Carlo Methods Christian Andersson Naesseth, Columbia University 1:55 PM Recent Advances in Bayesian Probabilistic Numerical Integration Francois-Xavier Briol, University College London 2:20 PM Factor regression for dimensionality reduction and data integration techniques with applications to cancer data Alejandra Avalos Pacheco, Harvard Medical School 2:45 PM Floor Discussion

## JB³ [Junior Bayes beyond the borders]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on June 22, 2020 by xi'an

Bocconi and j-ISBA are launcing a webinar series for and by junior Bayesian researchers. The first talk is on 25 June, 25 at 3pm UTC/GMT (5pm CET) with Francois-Xavier Briol, one of the laureates of the 2020 Savage Thesis Prize (and a former graduate of OxWaSP, the Oxford-Warwick doctoral training program), on Stein’s method for Bayesian computation, with as a discussant Nicolas Chopin.

As pointed out on their webpage,

Due to the importance of the above endeavor, JB³ will continue after the health emergency as an annual series. It will include various refinements aimed at increasing the involvement of the whole junior Bayesian community and facilitating a broader participation to the online seminars all over the world via various online solutions.

Thanks to all my friends at Bocconi for running this experiment!

## PhD studenships at Warwick

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , on May 2, 2019 by xi'an

There is an exciting opening for several PhD positions at Warwick, in the departments of Statistics and of Mathematics, as part of the Centre for Doctoral Training in Mathematics and Statistics newly created by the University. CDT studentships are funded for four years and funding is open to students from the European Union without restrictions. (No Brexit!) Funding includes a stipend at UK/RI rates and tuition fees at UK/EU rates. Applications are made via the University of Warwick Online Application Portal and should be made  as quickly as possible since the funding will be allocated on a first come first serve basis. For more details, contact the CDT director, Martyn Plummer. I cannot but strongly encourage interested students to apply as this is a great opportunity to start a research career in a fantastic department!

## distributed posteriors

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on February 27, 2019 by xi'an

Another presentation by our OxWaSP students introduced me to the notion of distributed posteriors, following a 2018 paper by Botond Szabó and Harry van Zanten. Which corresponds to the construction of posteriors when conducting a divide & conquer strategy. The authors show that an adaptation of the prior to the division of the sample is necessary to recover the (minimax) convergence rate obtained in the non-distributed case. This is somewhat annoying, except that the adaptation amounts to take the original prior to the power 1/m, when m is the number of divisions. They further show that when the regularity (parameter) of the model is unknown, the optimal rate cannot be recovered unless stronger assumptions are made on the non-zero parameters of the model.

“First of all, we show that depending on the communication budget, it might be advantageous to group local machines and let different groups work on different aspects of the high-dimensional object of interest. Secondly, we show that it is possible to have adaptation in communication restricted distributed settings, i.e. to have data-driven tuning that automatically achieves the correct bias-variance trade-off.”

I find the paper of considerable interest for scalable MCMC methods, even though the setting may happen to sound too formal, because the study incorporates parallel computing constraints. (Although I did not investigate the more theoretical aspects of the paper.)