Archive for Multinest

slice samplers for nested sampling

Posted in Books, Statistics, University life with tags , , , , on March 6, 2023 by xi'an

“…the likelihoods of discarded points have interesting properties. In particular, the fraction of prior mass below the likelihood threshold is approximately 1/K [number of particles].”

I came across a newly arXived paper on nested sampling, written by Johannes Buchner, with a focus on sampling over the constrained space defined by the lower bound on the likelihood value, and promoting different manners to implement a slice sampler in this possibly complex space.

“After a number of Metropolis steps, by which points with lower likelihood than required are not visited, a useful independent prior sample is obtained. This is only the case if enough steps are made, such that the random walk can reach all of the relevant volume.”

A slice sampler for nested sampling means (1) picking a direction v and (2) deriving the length of the slice by bisection, before (3) sampling uniformly over the interval. I do not get why the slice should necessarily be connected, rather than made of several segments for multimodal likelihoods. Ten versions are opposed when selecting the direction! With some missing the detailed balance property.

The comparison between these different slice samplers makes use of a shrinkage test proposed by the author in Statistics & Computing (2014), monitoring convergence by evaluating the volume ratio distribution of a sequence of discarded samples produced by nested sampling. Namely a test (on which I had reservations, blogged at the time) for following the decrease in volume predicted by the Uniform order statistics. Now, I have trouble understanding the calibration figures (like the one above) that are at the core of the paper towards ranking the ten versions…

dynamic nested sampling for stars

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , on April 12, 2019 by xi'an

In the sequel of earlier nested sampling packages, like MultiNest, Joshua Speagle has written a new package called dynesty that manages dynamic nested sampling, primarily intended for astronomical applications. Which is the field where nested sampling is the most popular. One of the first remarks in the paper is that nested sampling can be more easily implemented by using a Uniform reparameterisation of the prior, that is, a reparameterisation that turns the prior into a Uniform over the unit hypercube. Which means in fine that the prior distribution can be generated from a fixed vector of uniforms and known transforms. Maybe not such an issue given that this is the prior after all.  The author considers this makes sampling under the likelihood constraint a much simpler problem but it all depends in the end on the concentration of the likelihood within the unit hypercube. And on the ability to reach the higher likelihood slices. I did not see any special trick when looking at the documentation, but reflected on the fundamental connection between nested sampling and this ability. As in the original proposal by John Skilling (2006), the slice volumes are “estimated” by simulated Beta order statistics, with no connection with the actual sequence of simulation or the problem at hand. We did point out our incomprehension for such a scheme in our Biometrika paper with Nicolas Chopin. As in earlier versions, the algorithm attempts at visualising the slices by different bounding techniques, before proceeding to explore the bounded regions by several exploration algorithms, including HMC.

“As with any sampling method, we strongly advocate that Nested Sampling should not be viewed as being strictly“better” or “worse” than MCMC, but rather as a tool that can be more or less useful in certain problems. There is no “One True Method to Rule Them All”, even though it can be tempting to look for one.”

When introducing the dynamic version, the author lists three drawbacks for the static (original) version. One is the reliance on this transform of a Uniform vector over an hypercube. Another one is that the overall runtime is highly sensitive to the choice the prior. (If simulating from the prior rather than an importance function, as suggested in our paper.) A third one is the issue that nested sampling is impervious to the final goal, evidence approximation versus posterior simulation, i.e., uses a constant rate of prior integration. The dynamic version simply modifies the number of point simulated in each slice. According to the (relative) increase in evidence provided by the current slice, estimated through iterations. This makes nested sampling a sort of inversted Wang-Landau since it sharpens the difference between slices. (The dynamic aspects for estimating the volumes of the slices and the stopping rule may hinder convergence in unclear ways, which is not discussed by the paper.) Among the many examples produced in the paper, a 200 dimension Normal target, which is an interesting object for posterior simulation in that most of the posterior mass rests on a ring away from the maximum of the likelihood. But does not seem to merit a mention in the discussion. Another example of heterogeneous regression favourably compares dynesty with MCMC in terms of ESS (but fails to include an HMC version).

[Breaking News: Although I wrote this post before the exciting first image of the black hole in M87 was made public and hence before I was aware of it, the associated AJL paper points out relying on dynesty for comparing several physical models of the phenomenon by nested sampling.]


nested sampling with a test

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on December 5, 2014 by xi'an

backhomeOn my way back from Warwick, I read through a couple preprints, including this statistical test for nested sampling algorithms by Johannes Buchner. As it happens, I had already read and commented it in July! However, without the slightest memory of it (sad, isn’t it?!), I focussed this time much more on the modification proposed to MultiNest than on the test itself, which is in fact a Kolmogorov-Smirnov test applied to a specific target function.

Indeed, when reading the proposed modification of Buchner, I thought of a modification to the modification that sounded more appealing. Without getting back  to defining nested sampling in detail, this algorithm follows a swarm of N particles within upper-level sets of the likelihood surface, each step requiring a new simulation above the current value of the likelihood. The remark that set me on this time was that we should exploit the fact that (N-1) particles were already available within this level set. And uniformly distributed herein. Therefore this particle cloud should be exploited as much as possible to return yet another particle distributed just as uniformly as the other ones (!). Buchner proposes an alternative to MultiNest based on a randomised version of the maximal distance to a neighbour and a ball centre picked at random (but not uniformly). But it would be just as feasible to draw a distance from the empirical cdf of the distances to the nearest neighbours or to the k-nearest neighbours. With some possible calibration of k. And somewhat more accurate, because this distribution represents the repartition of the particle within the upper-level set. Although I looked at it briefly in the [sluggish] metro from Roissy airport, I could not figure out a way to account for the additional point to be included in the (N-1) existing particles. That is, how to deform the empirical cdf of those distances to account for an additional point. Unless one included the just-removed particle, which is at the boundary of this upper-level set. (Or rather, which defines the boundary of this upper-level set.) I have no clear intuition as to whether or not this would amount to a uniform generation over the true upper-level set. But simulating from the distance distribution would remove (I think) the clustering effect mentioned by Buchner.

“Other priors can be mapped [into the uniform prior over the unit hypercube] using the inverse of the cumulative prior distribution.”

Hence another illustration of the addictive features of nested sampling! Each time I get back to this notion, a new understanding or reinterpretation comes to mind. In any case, an equally endless source of projects for Master students. (Not that I agree with the above quote, mind you!)

a statistical test for nested sampling

Posted in Books, Statistics, University life with tags , , , , , on July 25, 2014 by xi'an

A new arXival on nested sampling: “A statistical test for nested sampling algorithms” by Johannes Buchner. The point of the test is to check if versions of the nested sampling algorithm that fail to guarantee increased likelihood (or nesting) at each step are not missing parts of the posterior mass. and hence producing biased evidence approximations. This applies to MultiNest for instance. This version of nest sampling evaluates the above-threshold region by drawing hyper-balls around the remaining points. A solution which is known to fail in one specific but meaningful case. Buchner’s  arXived paper proposes an hyper-pyramid distribution for which the volume of any likelihood constrained set is known. Hence allowing for a distribution test like Kolmogorov-Smirnov. Confirming the findings of Beaujean and Caldwell (2013). The author then proposes an alternative to MultiNest that is more robust but also much more costly as it computes distances between all pairs of bootstrapped samples. This solution passes the so-called “shrinkage test”, but it is orders of magnitude less efficient than MultiNest. And also simply shows that its coverage is fine for a specific target rather than all possible targets. I wonder if a solution to the problem is at all possible given that evaluating a support or a convex hull is a complex problem which complexity explodes with the dimension.

arXiv recent postings

Posted in Statistics, University life with tags , , , , , , , on June 14, 2013 by xi'an

As I glanced thru the recent arXiv postings in statistics, I found an overload of potential goodies, many more than I could handle in a reasonable time (unless I give up the NYT altogether!)…

For instance, Paulo Marques—a regular ‘Og commenter—wrote about an extension of Chib’s formula when the posterior is approximated in a non-parametric manner. (This was an idea I had toyed with at a time without pursuing it anywhere…)   I wonder at the stability of the approximation for two reasons: (i) Chib’s or Bayes’ formula does not require an expectation as the ratio is constant in θ; averaging over realisations of θ could have negative effects on the stability of the approximation (or not); (ii) using a non-parametric estimate will see the performances of Chib’s approximation plummet as the dimension of θ increases, most likely.

Another post is by Nogales, Pérez, and Monfort and deals with a Monte Carlo formula for conditional expectations that leaves me quite puzzled… More than excited about a potential application in ABC. Indeed, the idea is to replace the conditioning on the (hard) equality with a soft ball around the equality with a tolerance level ε. Since the authors do not seem particularly interested in ABC, I do no see the point in this technique. The first example they present does not account for the increasing computational effort as ε decreases. The second part of the paper deals with the best invariant estimator of the position parameter of a general half-normal distribution, using two conditional expectations in Proposition 2. I wonder how the method compares with bridge sampling and MCMC methods. As the paper seems to have gone beyond the Monte Carlo issue at this stage. And focus on this best invariant estimation…

Then Feroz, Hobson,  Cameron and Pettitt posted a work on importance nested sampling, on which I need to spend more time given the connection to our nested sampling paper with Nicolas. The paper builds on the Multinest software developped by Feroz, Hobson and co-authors. (Whose above picture is borrowed from.)

A side post on the moments of the partial non-central chi-square distribution function by Gill, Segura and Temme, since I always liked the specifics of this distribution… No comments so far!

%d bloggers like this: