Archive for harmonic mean estimator

reciprocal importance sampling

Posted in Books, pictures, Statistics with tags , , , , , , , , , on May 30, 2023 by xi'an

In a recent arXival, Metodiev et al. (including my friend Adrian Raftery, who is spending the academic year in Paris) proposed a new version of reciprocal importance sampling, expanding the proposal we made with Darren Wraith (2009) of using a Uniform over an HPD region. It is called THAMES, hence the picture (of London, not Paris!), for truncated harmonic mean estimator.

“…[Robert and Wraith (2009)] method has not yet been fully developed for realistic, higher-dimensional situations. For example, we know of no simple way to compute the volume of the convex hull of a set of points in higher dimensions.”

They suggest replacing the convex hull of the HPD points with an ellipsoid ϒ derived from a Normal distribution centred at the highest of the HPD points, whose covariance matrix is estimated from the whole (?) posterior sample. Which is somewhat surprising in that this ellipsoid may as well included low probability regions when the posterior is multimodal. For instance, the estimator is biased when the posterior cancels on parts of ϒ. And with an unclear fate for the finiteness of its variance, depending on how fast the posterior gets to zero on these parts.

The central feature of the paper is selecting the radius of the ellipse that minimises the variance of the (counter) evidence. Under asymptotic normality of the posterior. This radius roughly corresponds to our HPD region in that 50% of the sample stands within. The authors also notice that separate samples should be used to estimate the ellipse and to estimate the evidence. And that a correction is necessary when the posterior support is restricted. (Examples do not include multimodal targets, apparently.)

back to a correction of the harmonic mean estimator

Posted in Books, Statistics with tags , , , , , on May 11, 2023 by xi'an

In a 2009 JCGS paper, Peter Lenk proposed a bias correction of the harmonic mean estimator, which is somewhat surprising given that the estimator usually has no variance and hence that its consistency is purely formal, since no speed of convergence can be taken for granted. In particular, the conjugate Normal model serving as a motivation leads to an infinite variance. The author is however blaming the poor behaviour of the harmonic mean estimator on the overly concentrated support of the posterior distribution, despite having no reservation about the original identity (with standard notations)

m(x)^{-1} = \int \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta

but suggesting the corrected

m(x)^{-1} = \int_A \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta\big/ \Pi(A)

although this is only true when A is within the support of the posterior. (In which case it connects with our own 2009 correction.) Opting for a set A corresponding to a “simulation support” of the posterior with a very vague meaning, if somewhat connected with the nested sampling starting set.

ABConic mean evidence approximation

Posted in Statistics with tags , , , , , on March 7, 2023 by xi'an

Following a question on X validated about evidence approximation in ABC settings, i.e., on returning an approximation of the evidence based on the outputs of parallel ABC runs for the models under comparison, I wondered at the relevance of an harmonic mean estimator in that context.

Rather than using the original ABC algorithm that proposes a model, a parameter from that model, and a simulated dataset from that model with that parameter,  an alternate, cost-free, solution would be to run an ABC version of [harmonic mean evidence approximation à la Newton & Raftery (1994). Since

\mathcal Z=1\Big/\int \dfrac{\pi(\theta|D)}{p(D|\theta)}\,\text d\theta

the evidence can formally be approximated by

\hat{\mathcal Z} =1\Big/\frac{1}{N}\sum_{i=1}^N\frac{1}{p(D|\theta_i)}\qquad\theta_i\sim\pi(\theta|D)

and its ABC version is

\hat{\mathcal Z} =1\Big/\frac{1}{N}\sum_{i=1}^N\frac{1}{K_\epsilon(d(D,D^\star(\theta_i)))}\qquad\theta_i\sim\pi^\epsilon(\theta|D)

where Kε(.) is the kernel used for the ABC acceptance/rejection step and d(.,.) is the distance used to measure the discrepancy between samples. Since the kernel values are already computed for evidence, the cost is null. Obviously, an indicator kernel does not return a useful estimate but something like a Cauchy kernel could do.

However, when toying with a normal-normal model and calibrating the Cauchy scale to fit the actual posterior as in the above graph, the estimated evidence 5 10⁻⁵ proved much smaller than the actual one, 8 10⁻².

day five at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 4, 2022 by xi'an

Woke up even earlier today! Which left me time to work on switching to Leonard Cohen’s song titles for my slide frametitles this afternoon (last talk of the whole conference!), run once again to Mon(t) Royal as all pools are closed (Happy Canada Day!, except to “freedom convoy” antivaxxxers.) Which led to me meeting a raccoon by the side of the path (and moroons feeding wildlife).

Had an exciting time at the morning session, where Giacomo Zanella (formerly Warwick) talked on a mixture approach to leave-one-out predictives, with pseudo-harmonic mean representation, averaging inverse density across all observations. Better than harmonic? Some assumptions allow for finite variance, although I am missing the deep argument (in part due to Giacomo’s machine-gun delivery pace!) Then Alicia Corbella (Warwick) presented a promising entry into PDMP by proposing an automated zig-zag sampler. Pointing out on the side to Joris Bierkens’ webpage on the state-of-the-art PDMP methodology. In this approach, joint with with my other Warwick colleagues Simon Spencer and Gareth Roberts, the zig-zag sampler relies on automatic differentiation and sub-sampling and bound derivation, with “no further information on the target needed”. And finaly Chris Carmona presented a joint work with Geoff Nicholls that is merging merging cut posteriors and variational inference to create a meta posterior. Work and talk were motivated by a nice medieval linguistic problem where the latent variables impact the (convergence of the) MCMC algorithm [as in our k-nearest neighbour experience]. Interestingly using normalising [neural spline] flows. The pseudo-posterior seems to depend very much on their modularization rate η, which penalises how much one module influences the next one.

In the aft, I attended sort of by chance [due to a missing speaker in the copula session] to the end of a session on migration modelling, with a talk by Jason Hilton and Martin Hinsch focussing on the 2015’s mass exodus of Syrians through the Mediterranean,  away from the joint evils of al-Hassad and ISIS. As this was a tragedy whose modelling I had vainly tried to contribute to, I was obviously captivated and frustrated (leaning of the IOM missing migrant project!) Fitting the agent-based model was actually using ABC, and most particularly our ABC-PMC!!!

My own and final session had Gareth (Warwick) presenting his recent work with Jun Yang and Kryzs Łatuszyński (Warwick) on the stereoscopic projection improvement over regular MCMC, which involves turning the target into a distribution supported by an hypersphere and hence considering a distribution with compact support and higher efficiency. Kryzs had explained the principle while driving back from Gregynog two months ago. The idea is somewhat similar to our origaMCMC, which I presented at MCqMC 2016 in Stanford (and never completed), except our projection was inside a ball. Looking forward the adaptive version, in the making!

And to conclude this subjective journal from the ISBA conference, borrowing this title by (Westmount born) Leonard Cohen, “Hey, that’s not a way to say goodbye”… To paraphrase Bilbo Baggins, I have not interacted with at least half the participants half as much as I would have liked. But this was still a reunion, albeit in the new Normal. Hopefully, the conference will not have induced a massive COVID cluster on top of numerous scientific and social exchanges! The following days will tell. Congrats to the ISBA 2022 organisers for achieving a most successful event in these times of uncertainty. And looking forward the 2024 next edition in Ca’Foscari, Venezia!!!

 

Another harmonic mean

Posted in Books, Statistics, University life with tags , , , , , , , , on May 21, 2022 by xi'an

Yet another paper that addresses the approximation of the marginal likelihood by a truncated harmonic mean, a popular theme of mine. A 2020 paper by Johannes Reich, entitled Estimating marginal likelihoods from the posterior draws through a geometric identity and published in Monte Carlo Methods and Applications.

The geometric identity it aims at exploiting is that

m(x) = \frac{\int_A \,\text d\theta}{\int_A \pi(\theta|x)\big/\pi(\theta)f(x|\theta)\,\text d\theta}

for any (positive volume) compact set $A$. This is exactly the same identity as in an earlier and uncited 2017 paper by Ana Pajor, with the also quite similar (!) title Estimating the Marginal Likelihood Using the Arithmetic Mean Identity and which I discussed on the ‘Og, linked with another 2012 paper by Lenk. Also discussed here. This geometric or arithmetic identity is again related to the harmonic mean correction based on a HPD region A that Darren Wraith and myself proposed at MaxEnt 2009. And that Jean-Michel and I presented at Frontiers of statistical decision making and Bayesian analysis in 2010.

In this avatar, the set A is chosen close to an HPD region, once more, with a structure that allows for an exact computation of its volume. Namely an ellipsoid that contains roughly 50% of the simulations from the posterior (rather than our non-intersecting union of balls centered at the 50% HPD points), which assumes a Euclidean structure of the parameter space (or, in other words, depends on the parameterisation)In the mixture illustration, the author surprisingly omits Chib’s solution, despite symmetrised versions avoiding the label (un)switching issues. . What I do not get is how this solution gets around the label switching challenge in that set A remains an ellipsoid for multimodal posteriors, which means it either corresponds to a single mode [but then how can a simulation be restricted to a “single permutation of the indicator labels“?] or it covers all modes but also the unlikely valleys in-between.

 

%d bloggers like this: