Archive for annealed importance sampling

telescope on evidence for graphical models

Posted in Books, Statistics, University life with tags , , , , , , , , , on February 29, 2024 by xi'an

A recent paper on evidence by Anindya Bhadra, Ksheera Sagar, Sayantan Banerjee (whom I met during Rito’s seminar, since he was also visiting Ismael in Paris, and who mentioned this work), and Jyotishka Datta, on computing the evidence for graphical models. Obtaining an approximation of the evidence attached with a model and a prior on the covariance matrix Ω is a challenge they manage to address in a particularly clever manner.

“the conditional posterior density [of the last column of the covariance matrix] can be evaluated as a product of normal and gamma densities under suitable priors (…) We resolve this [difficulty with the integrated likelihood] by evaluating the required densities in one row or column at a time, and proceeding backwards starting from the p-th row, with appropriate adjustments to Ωp×p at each step via Schur complement. “

Using a telescoping trick, the authors exploit the fact that the decomposition

\log f(y_{1:p})=\log f(y_p|y_{1:p-1},\theta_p)+\log f (y_{1:p-1}|\theta_p)+\log f(\theta_p)-\log f(\theta_p|y_{1:p})

involves a problematic second term that can be ignored by successive cancellations, as shown by Figure 1. The other terms are manageable for some classes of priors on Ω. Like a Wishart. This allows them to call for Chib’s (two-black) method, which requires two independent MCMC runs. Actually, an unfortunate aspect of the approach is that its computational complexity is of order O(M p), where M is the number of MCMC samples, due to the telescopic trick involving calling Chib’s approach for each of the p columns of Ω. While the numerical outcomes compare with nested sampling, annealed importance sampling, and even harmonic mean estimates (!), the computing time usually exceeds those for these other methods, esp. harmonic mean estimates For the specific G-Wishart case, the solution proposed by Atay-Kayis and Massam (2005) proves far superior. Since the main purpose of using evidence is in deriving Bayes factors, I wonder at possible gains in recycling simulations between models, even though this would seem to call for bridge sampling, no considered in the paper.

diffusions, sampling, and transport

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 21, 2022 by xi'an

The third and final day of the workshop was shortened for me as I had to catch an early flight back to Paris (and as I got overly conservative in my estimation for returning to JFK, catching a train with no delay at Penn Station and thus finding myself with two hours free before boarding, hence reviewing remaining Biometrika submission at the airport while waiting). As a result I missed the afternoon talks.

The morning was mostly about using scores for simulation (a topic of which I was mostly unaware), with Yang Song giving the introductory lecture on creating better [cf pix left] generative models via the score function, with a massive production of his on the topic (but too many image simulations of dogs, cats, and celebrities!). Estimating directly the score is feasible via Fisher divergence or score matching à la Hyvärinen (with a return of Stein’s unbiased estimator of the risk!). And relying on estimated scores to simulate / generate by Langevin dynamics or other MCMC methods that do not require density evaluations. Due to poor performances in low density / learning regions a fix is randomization / tempering but the resolution (as exposed) sounded clumsy. (And made me wonder at using some more advanced form of deconvolution since the randomization pattern is controlled.) The talk showed some impressive text to image simulations used by an animation studio!


And then my friend Arnaud Doucet continued on the same theme, motivating by estimating normalising constant through annealed importance sampling [Yuling’s meta-perspective comes back to mind in that the geometric mixture is not the only choice, but with which objective]. In AIS, as in a series of Arnaud’s works, like the 2006 SMC Read Paper with Pierre Del Moral and Ajay Jasra, the importance (!) of some auxiliary backward kernels goes beyond theoretical arguments, with the ideally sequence being provided by a Langevin diffusion. Hence involving a score, learned as in the previous talk. Arnaud reformulated this issue as creating a transportation map and its reverse, which is leading to their recent Schrödinger bridge generative model. Which [imho] both brings a unification perspective to his work and an efficient way to bridge prior to posterior in AIS. A most profitable morn for me!

Overall, this was an exhilarating workshop, full of discoveries for me and providing me with the opportunity to meet and exchange with mostly people I had not met before. Thanks to Bob Carpenter and Michael Albergo for organising and running the workshop!