Archive for nested sampling

Natural nested sampling

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on May 28, 2023 by xi'an

“The nested sampling algorithm solves otherwise challenging, high-dimensional integrals by evolving a collection of live points through parameter space. The algorithm was immediately adopted in cosmology because it partially overcomes three of the major difficulties in Markov chain Monte Carlo, the algorithm traditionally used for Bayesian computation. Nested sampling simultaneously returns results for model comparison and parameter inference; successfully solves multimodal problems; and is naturally self-tuning, allowing its immediate application to new challenges.”

I came across a review on nested sampling in Nature Reviews Methods Primers of May 2022, with a large number of contributing authors, some of whom I knew from earlier papers in astrostatistics. As illustrated by the above quote from the introduction, the tone is definitely optimistic about the capacities of the method, reproducing the original argument that the evidence is the posterior expectation of the likelihood L(θ) under the prior. Which representation, while valid, is not translating into a dimension-free methodology since parameters θ still need be simulated.

“Nested sampling lies in a class of algorithms that form a path of bridging distributions and evolves samples along that path. Nested sampling stands out because the path is automatic and smooth — compression along log X by, on average, 1/𝑛at each iteration — and because along the path is compressed through constrained priors, rather than from the prior to the posterior. This was a motivation for nested sampling as it avoids phase transitions — abrupt changes in the bridging distributions — that cause problems for other methods, including path samplers, such as annealing.”

The elephant in the room is eventually processed, namely the simulation from the prior constrained to the likelihood level sets that in my experience (with, e.g., mixture posteriors) proves most time consuming. This stems from the fact that these level sets are notoriously difficult to evaluate from a given sample: all points stand within the set but they hardly provide any indication of the boundaries of saif set… Region sampling requires to construct a region that bounds the likelihood level set, which requires some knowledge of the likelihood variations to have a chance to remain efficient, incl. in cosmological applications, while regular MCMC steps require an increasing number of steps as the constraint gets tighter and tighter. For otherwise it essentially amounts to duplicating a live particle.

back to a correction of the harmonic mean estimator

Posted in Books, Statistics with tags , , , , , on May 11, 2023 by xi'an

In a 2009 JCGS paper, Peter Lenk proposed a bias correction of the harmonic mean estimator, which is somewhat surprising given that the estimator usually has no variance and hence that its consistency is purely formal, since no speed of convergence can be taken for granted. In particular, the conjugate Normal model serving as a motivation leads to an infinite variance. The author is however blaming the poor behaviour of the harmonic mean estimator on the overly concentrated support of the posterior distribution, despite having no reservation about the original identity (with standard notations)

m(x)^{-1} = \int \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta

but suggesting the corrected

m(x)^{-1} = \int_A \dfrac{\pi(\theta|x)}{f(x|\theta)}\,\text d \theta\big/ \Pi(A)

although this is only true when A is within the support of the posterior. (In which case it connects with our own 2009 correction.) Opting for a set A corresponding to a “simulation support” of the posterior with a very vague meaning, if somewhat connected with the nested sampling starting set.

slice samplers for nested sampling

Posted in Books, Statistics, University life with tags , , , , on March 6, 2023 by xi'an

“…the likelihoods of discarded points have interesting properties. In particular, the fraction of prior mass below the likelihood threshold is approximately 1/K [number of particles].”

I came across a newly arXived paper on nested sampling, written by Johannes Buchner, with a focus on sampling over the constrained space defined by the lower bound on the likelihood value, and promoting different manners to implement a slice sampler in this possibly complex space.

“After a number of Metropolis steps, by which points with lower likelihood than required are not visited, a useful independent prior sample is obtained. This is only the case if enough steps are made, such that the random walk can reach all of the relevant volume.”

A slice sampler for nested sampling means (1) picking a direction v and (2) deriving the length of the slice by bisection, before (3) sampling uniformly over the interval. I do not get why the slice should necessarily be connected, rather than made of several segments for multimodal likelihoods. Ten versions are opposed when selecting the direction! With some missing the detailed balance property.

The comparison between these different slice samplers makes use of a shrinkage test proposed by the author in Statistics & Computing (2014), monitoring convergence by evaluating the volume ratio distribution of a sequence of discarded samples produced by nested sampling. Namely a test (on which I had reservations, blogged at the time) for following the decrease in volume predicted by the Uniform order statistics. Now, I have trouble understanding the calibration figures (like the one above) that are at the core of the paper towards ranking the ten versions…

evidence estimation in finite and infinite mixture models

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 20, 2022 by xi'an

Adrien Hairault (PhD student at Dauphine), Judith and I just arXived a new paper on evidence estimation for mixtures. This may sound like a well-trodden path that I have repeatedly explored in the past, but methinks that estimating the model evidence doth remain a notoriously difficult task for large sample or many component finite mixtures and even more for “infinite” mixture models corresponding to a Dirichlet process. When considering different Monte Carlo techniques advocated in the past, like Chib’s (1995) method, SMC, or bridge sampling, they exhibit a range of performances, in terms of computing time… One novel (?) approach in the paper is to write Chib’s (1995) identity for partitions rather than parameters as (a) it bypasses the label switching issue (as we already noted in Hurn et al., 2000), another one is to exploit  Geyer (1991-1994) reverse logistic regression technique in the more challenging Dirichlet mixture setting, and yet another one a sequential importance sampling solution à la  Kong et al. (1994), as also noticed by Carvalho et al. (2010). [We did not cover nested sampling as it quickly becomes onerous.]

Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest and diverging opinions, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish there the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric ‘strongly identifiable’ Dirichlet Process Mixture (DPM) model.

likelihood-free nested sampling

Posted in Books, Statistics with tags , , , , , , on April 11, 2022 by xi'an

Last week, I came by chance across a paper by Jan Mikelson and Mustafa Khammash on a likelihood-free version of nested sampling (a popular keyword on the ‘Og!). Published in 2020 in PLoS Comput Biol. The setup is a parameterised and hidden state-space model, which allows for an approximation of the (observed) likelihood function L(θ|y) by means of a particle filter. An immediate issue with this proposal is that a novel  filter need be produced for a new value of the parameter θ, which makes it enormously expensive. It then gets more bizarre as the [Monte Carlo] distribution of the particle filter approximation ô(θ|y) is agglomerated with the original prior π(θ) as a joint “prior” [despite depending on the observed y] and a nested sampling is conducted with level sets of the form

ô(θ|y)>ε.

Actually, if the Monte Carlo error was null, that is, if the number of particles was infinite,

ô(θ|y)=L(θ|y)

implies that this is indeed the original nested sampler. Simulation from the restricted region is done by constructing an extra density estimator of the constrained distribution (in θ)…

“We have shown how using a Monte Carlo estimate over the livepoints not only results in an unbiased estimator of the Bayesian evidence Z, but also allows us to derive a formulation for a lower bound on the achievable variance in each iteration (…)”

As shown by the above the authors insist on the unbiasedness of the particle approximation, but since nested sampling is not producing an unbiased estimator of the evidence Z, the point is somewhat moot. (I am also rather surprised by the reported lack of computing time benefit in running ABC-SMC.)

%d bloggers like this: