Archive for harmonic mean estimator

telescope on evidence for graphical models

Posted in Books, Statistics, University life with tags , , , , , , , , , on February 29, 2024 by xi'an

A recent paper on evidence by Anindya Bhadra, Ksheera Sagar, Sayantan Banerjee (whom I met during Rito’s seminar, since he was also visiting Ismael in Paris, and who mentioned this work), and Jyotishka Datta, on computing the evidence for graphical models. Obtaining an approximation of the evidence attached with a model and a prior on the covariance matrix Ω is a challenge they manage to address in a particularly clever manner.

“the conditional posterior density [of the last column of the covariance matrix] can be evaluated as a product of normal and gamma densities under suitable priors (…) We resolve this [difficulty with the integrated likelihood] by evaluating the required densities in one row or column at a time, and proceeding backwards starting from the p-th row, with appropriate adjustments to Ωp×p at each step via Schur complement. “

Using a telescoping trick, the authors exploit the fact that the decomposition

\log f(y_{1:p})=\log f(y_p|y_{1:p-1},\theta_p)+\log f (y_{1:p-1}|\theta_p)+\log f(\theta_p)-\log f(\theta_p|y_{1:p})

involves a problematic second term that can be ignored by successive cancellations, as shown by Figure 1. The other terms are manageable for some classes of priors on Ω. Like a Wishart. This allows them to call for Chib’s (two-black) method, which requires two independent MCMC runs. Actually, an unfortunate aspect of the approach is that its computational complexity is of order O(M p), where M is the number of MCMC samples, due to the telescopic trick involving calling Chib’s approach for each of the p columns of Ω. While the numerical outcomes compare with nested sampling, annealed importance sampling, and even harmonic mean estimates (!), the computing time usually exceeds those for these other methods, esp. harmonic mean estimates For the specific G-Wishart case, the solution proposed by Atay-Kayis and Massam (2005) proves far superior. Since the main purpose of using evidence is in deriving Bayes factors, I wonder at possible gains in recycling simulations between models, even though this would seem to call for bridge sampling, no considered in the paper.

reciprocal importance sampling

Posted in Books, pictures, Statistics with tags , , , , , , , , , on May 30, 2023 by xi'an

In a recent arXival, Metodiev et al. (including my friend Adrian Raftery, who is spending the academic year in Paris) proposed a new version of reciprocal importance sampling, expanding the proposal we made with Darren Wraith (2009) of using a Uniform over an HPD region. It is called THAMES, hence the picture (of London, not Paris!), for truncated harmonic mean estimator.

“…[Robert and Wraith (2009)] method has not yet been fully developed for realistic, higher-dimensional situations. For example, we know of no simple way to compute the volume of the convex hull of a set of points in higher dimensions.”

They suggest replacing the convex hull of the HPD points with an ellipsoid ϒ derived from a Normal distribution centred at the highest of the HPD points, whose covariance matrix is estimated from the whole (?) posterior sample. Which is somewhat surprising in that this ellipsoid may as well included low probability regions when the posterior is multimodal. For instance, the estimator is biased when the posterior cancels on parts of ϒ. And with an unclear fate for the finiteness of its variance, depending on how fast the posterior gets to zero on these parts.

The central feature of the paper is selecting the radius of the ellipse that minimises the variance of the (counter) evidence. Under asymptotic normality of the posterior. This radius roughly corresponds to our HPD region in that 50% of the sample stands within. The authors also notice that separate samples should be used to estimate the ellipse and to estimate the evidence. And that a correction is necessary when the posterior support is restricted. (Examples do not include multimodal targets, apparently.)

back to a correction of the harmonic mean estimator

Posted in Books, Statistics with tags , , , , , on May 11, 2023 by xi'an

In a 2009 JCGS paper, Peter Lenk proposed a bias correction of the harmonic mean estimator, which is somewhat surprising given that the estimator usually has no variance and hence that its consistency is purely formal, since no speed of convergence can be taken for granted. In particular, the conjugate Normal model serving as a motivation leads to an infinite variance. The author is however blaming the poor behaviour of the harmonic mean estimator on the overly concentrated support of the posterior distribution, despite having no reservation about the original identity (with standard notations)

m(x)^{-1} = \int \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta

but suggesting the corrected

m(x)^{-1} = \int_A \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta\big/ \Pi(A)

although this is only true when A is within the support of the posterior. (In which case it connects with our own 2009 correction.) Opting for a set A corresponding to a “simulation support” of the posterior with a very vague meaning, if somewhat connected with the nested sampling starting set.

ABConic mean evidence approximation

Posted in Statistics with tags , , , , , on March 7, 2023 by xi'an

Following a question on X validated about evidence approximation in ABC settings, i.e., on returning an approximation of the evidence based on the outputs of parallel ABC runs for the models under comparison, I wondered at the relevance of an harmonic mean estimator in that context.

Rather than using the original ABC algorithm that proposes a model, a parameter from that model, and a simulated dataset from that model with that parameter,  an alternate, cost-free, solution would be to run an ABC version of [harmonic mean evidence approximation à la Newton & Raftery (1994). Since

\mathcal Z=1\Big/\int \dfrac{\pi(\theta|D)}{p(D|\theta)}\,\text d\theta

the evidence can formally be approximated by

\hat{\mathcal Z} =1\Big/\frac{1}{N}\sum_{i=1}^N\frac{1}{p(D|\theta_i)}\qquad\theta_i\sim\pi(\theta|D)

and its ABC version is

\hat{\mathcal Z} =1\Big/\frac{1}{N}\sum_{i=1}^N\frac{1}{K_\epsilon(d(D,D^\star(\theta_i)))}\qquad\theta_i\sim\pi^\epsilon(\theta|D)

where Kε(.) is the kernel used for the ABC acceptance/rejection step and d(.,.) is the distance used to measure the discrepancy between samples. Since the kernel values are already computed for evidence, the cost is null. Obviously, an indicator kernel does not return a useful estimate but something like a Cauchy kernel could do.

However, when toying with a normal-normal model and calibrating the Cauchy scale to fit the actual posterior as in the above graph, the estimated evidence 5 10⁻⁵ proved much smaller than the actual one, 8 10⁻².

day five at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 4, 2022 by xi'an

Woke up even earlier today! Which left me time to work on switching to Leonard Cohen’s song titles for my slide frametitles this afternoon (last talk of the whole conference!), run once again to Mon(t) Royal as all pools are closed (Happy Canada Day!, except to “freedom convoy” antivaxxxers.) Which led to me meeting a raccoon by the side of the path (and moroons feeding wildlife).

Had an exciting time at the morning session, where Giacomo Zanella (formerly Warwick) talked on a mixture approach to leave-one-out predictives, with pseudo-harmonic mean representation, averaging inverse density across all observations. Better than harmonic? Some assumptions allow for finite variance, although I am missing the deep argument (in part due to Giacomo’s machine-gun delivery pace!) Then Alicia Corbella (Warwick) presented a promising entry into PDMP by proposing an automated zig-zag sampler. Pointing out on the side to Joris Bierkens’ webpage on the state-of-the-art PDMP methodology. In this approach, joint with with my other Warwick colleagues Simon Spencer and Gareth Roberts, the zig-zag sampler relies on automatic differentiation and sub-sampling and bound derivation, with “no further information on the target needed”. And finaly Chris Carmona presented a joint work with Geoff Nicholls that is merging merging cut posteriors and variational inference to create a meta posterior. Work and talk were motivated by a nice medieval linguistic problem where the latent variables impact the (convergence of the) MCMC algorithm [as in our k-nearest neighbour experience]. Interestingly using normalising [neural spline] flows. The pseudo-posterior seems to depend very much on their modularization rate η, which penalises how much one module influences the next one.

In the aft, I attended sort of by chance [due to a missing speaker in the copula session] to the end of a session on migration modelling, with a talk by Jason Hilton and Martin Hinsch focussing on the 2015’s mass exodus of Syrians through the Mediterranean,  away from the joint evils of al-Hassad and ISIS. As this was a tragedy whose modelling I had vainly tried to contribute to, I was obviously captivated and frustrated (leaning of the IOM missing migrant project!) Fitting the agent-based model was actually using ABC, and most particularly our ABC-PMC!!!

My own and final session had Gareth (Warwick) presenting his recent work with Jun Yang and Kryzs Łatuszyński (Warwick) on the stereoscopic projection improvement over regular MCMC, which involves turning the target into a distribution supported by an hypersphere and hence considering a distribution with compact support and higher efficiency. Kryzs had explained the principle while driving back from Gregynog two months ago. The idea is somewhat similar to our origaMCMC, which I presented at MCqMC 2016 in Stanford (and never completed), except our projection was inside a ball. Looking forward the adaptive version, in the making!

And to conclude this subjective journal from the ISBA conference, borrowing this title by (Westmount born) Leonard Cohen, “Hey, that’s not a way to say goodbye”… To paraphrase Bilbo Baggins, I have not interacted with at least half the participants half as much as I would have liked. But this was still a reunion, albeit in the new Normal. Hopefully, the conference will not have induced a massive COVID cluster on top of numerous scientific and social exchanges! The following days will tell. Congrats to the ISBA 2022 organisers for achieving a most successful event in these times of uncertainty. And looking forward the 2024 next edition in Ca’Foscari, Venezia!!!