machine-learning harmonic mean
In a recent arXival, Jason McEwen propose a resurrection of the “infamous” harmonic mean estimator. In Machine learning assisted Bayesian model comparison: learnt harmonic mean estimator, they propose to aim at the “optimal importance function”. The paper provides a fair coverage of the literature on that topic, incl. our 2009 paper with Darren Wraith (although I do not follow the criticism of using a uniform over an HPD region, esp. since one of the learnt targets is also a uniform over an hypersphere, presumably optimised in terms of the chosen parameterisation).
“…the learnt harmonic mean estimator, a variant of the original estimator that solves its large variance problem. This is achieved by interpreting the harmonic mean estimator as importance sampling and introducing a new target distribution (…) learned to approximate the optimal but inaccessible target, while minimising the variance of the resulting estimator. Since the estimator requires samples of the posterior only it is agnostic to the strategy used to generate posterior samples.”
The method thus builds upon Gelfand and Dey (1994) general proposal that is a form of inverse importance sampling since the numerator [the new target] is free while the denominator is the unnormalised posterior. The optimal target being the complete posterior (since it lead to a null variance), the authors propose to try to approximate this posterior by various means. (Note however that an almost Dirac mass at a value with positive posterior would work as well!, at least in principle…) as the sections on moment approximations sound rather standard (and assume the estimated variances are finite) while the reason for the inclusion of the Bayes factor approximation is rather unclear. However, I am rather skeptical at the proposals made therein towards approximating the posterior distribution, from a Gaussian mixture [for which parameterisation?] to KDEs, or worse ML tools like neural nets [not explored there, which makes one wonder about the title], as the estimands will prove very costly, and suffer from the curse of dimensionality (3 hours for d=2¹⁰…).The Pima Indian women’s diabetes dataset and its quasi-Normal posterior are used as a benchmark, meaning that James and Nicolas did not shout loud enough! And I find surprising that most examples include the original harmonic mean estimator despite its complete lack of trustworthiness.
Leave a Reply