## nested sampling: any prior anytime?!

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , on March 26, 2021 by xi'an

A recent arXival by Justin Alsing and Will Handley on “nested sampling with any prior you like” caught my attention. If only because I was under the impression that some priors would not agree with nested sampling. Especially those putting positive weight on some fixed levels of the likelihood function, as well as improper priors.

“…nested sampling has largely only been practical for a somewhat restrictive class of priors, which have a readily available representation as a transform from the unit hyper-cube.”

Reading from the paper, it seems that the whole point is to demonstrate that “any proper prior may be transformed onto the unit hypercube via a bijective transformation.” Which seems rather straightforward if the transform is not otherwise constrained: use a logit transform in every direction. The paper gets instead into the rather fashionable direction of normalising flows as density representations. (Which suddenly reminded me of the PhD dissertation of Rob Cornish at Oxford, which I examined last year. Even though nested was not used there in the same understanding.) The purpose appearing later (in the paper) or in fine to express a random variable simulated from the prior as the (generative) transform of a Uniform variate, f(U). Resuscitating the simulation from an arbitrary distribution from first principles.

“One particularly common scenario where this arises is when one wants to use the (sampled) posterior from one experiment as the prior for another”

But I remained uncertain at the requirement for this representation in implementing nested sampling as I do not see how it helps in bypassing the hurdles of simulating from the prior constrained by increasing levels of the likelihood function. It would be helpful to construct normalising flows adapted to the truncated priors but I did not see anything related to this version in the paper.

The cosmological application therein deals with the incorporation of recent measurements in the study of the ΛCDM cosmological model, that is, more recent that the CMB Planck dataset we played with 15 years ago. (Time flies, even if an expanding Universe!) Namely, the Baryon Oscillation Spectroscopic Survey and the SH0ES collaboration.

## MCMSki [day 2]

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , on January 8, 2014 by xi'an

I was still feeling poorly this morning with my brain in a kind of flu-induced haze so could not concentrate for a whole talk, which is a shame as I missed most of the contents of the astrostatistics session put together by David van Dyk… Especially the talk by Roberto Trotta I was definitely looking for. And the defence of nested sampling strategies for marginal likelihood approximations. Even though I spotted posterior distributions for WMAP and Plank data on the ΛCDM that reminded me of our own work in this area… Apologies thus to all speakers for dozing in and out, it was certainly not due to a lack of interest!

Sebastian Seehars mentioned emcee (for ensemble Monte Carlo), with a corresponding software nicknamed “the MCMC hammer”, and their own CosmoHammer software. I read the paper by Goodman and Ware (2010) this afternoon during the ski break (if not on a ski lift!). Actually, I do not understand why an MCMC should be affine invariant: a good adaptive MCMC sampler should anyway catch up the right scale of the target distribution. Other than that, the ensemble sampler reminds me very much of the pinball sampler we developed with Kerrie Mengersen (1995 Valencia meeting), where the target is the product of L targets,

$\pi(x_1)\cdots\pi(x_L)$

and a Gibbs-like sampler can be constructed, moving one component (with index k, say) of the L-sample at a time. (Just as in the pinball sampler.) Rather than avoiding all other components (as in the pinball sampler), Goodman and Ware draw a single other component at random  (with index j, say) and make a proposal away from it:

$\eta=x_j(t) + \zeta \{x_k(t)-x_j(t)\}$

where ζ is a scale random variable with (log-) symmetry around 1. The authors claim improvement over a single track Metropolis algorithm, but it of course depends on the type of Metropolis algorithms that is chosen… Overall, I think the criticism of the pinball sampler also applies here: using a product of targets can only slow down the convergence. Further, the affine structure of the target support is not a given. Highly constrained settings should not cope well with linear transforms and non-linear reparameterisations would be more efficient….