trans-dimensional nested sampling and a few planets

This morning, in the train to Dauphine (train that was even more delayed than usual!), I read a recent arXival of Brendon Brewer and Courtney Donovan. Entitled Fast Bayesian inference for exoplanet discovery in radial velocity data, the paper suggests to associate Matthew Stephens’ (2000)  birth-and-death MCMC approach with nested sampling to infer about the number N of exoplanets in an exoplanetary system. The paper is somewhat sparse in its description of the suggested approach, but states that the birth-date moves involves adding a planet with parameters simulated from the prior and removing a planet at random, both being accepted under a likelihood constraint associated with nested sampling. I actually wonder if this actually is the birth-date version of Peter Green’s (1995) RJMCMC rather than the continuous time birth-and-death process version of Matthew…

“The traditional approach to inferring N also contradicts fundamental ideas in Bayesian computation. Imagine we are trying to compute the posterior distribution for a parameter a in the presence of a nuisance parameter b. This is usually solved by exploring the joint posterior for a and b, and then only looking at the generated values of a. Nobody would suggest the wasteful alternative of using a discrete grid of possible a values and doing an entire Nested Sampling run for each, to get the marginal likelihood as a function of a.”

This criticism is receivable when there is a huge number of possible values of N, even though I see no fundamental contradiction with my ideas about Bayesian computation. However, it is more debatable when there are a few possible values for N, given that the exploration of the augmented space by a RJMCMC algorithm is often very inefficient, in particular when the proposed parameters are generated from the prior. The more when nested sampling is involved and simulations are run under the likelihood constraint! In the astronomy examples given in the paper, N never exceeds 15… Furthermore, by merging all N’s together, it is unclear how the evidences associated with the various values of N can be computed. At least, those are not reported in the paper.

The paper also omits to provide the likelihood function so I do not completely understand where “label switching” occurs therein. My first impression is that this is not a mixture model. However if the observed signal (from an exoplanetary system) is the sum of N signals corresponding to N planets, this makes more sense.

10 Responses to “trans-dimensional nested sampling and a few planets”

  1. In response to the original post

    “This criticism is receivable when there is a huge number of possible values of N, even though I see no fundamental contradiction with my ideas about Bayesian computation. However, it is more debatable when there are a few possible values for N, given that the exploration of the augmented space by a RJMCMC algorithm is often very inefficient, in particular when the proposed parameters are generated from the prior.”

    Yes, I agree. This paragraph was aimed at astronomers, many of whom only know about the ‘different trial values of N’ approach.

    “The more when nested sampling is involved and simulations are run under the likelihood constraint!”

    I think it’s less. The DNS target distribution is usually easier than the posterior, because the posterior might be dominated by levels 50-70 (say) yet the trans dimensional moves might be accepted a lot in level 30 where the likelihood constraint is lower.

  2. Thanks Dan, I’ll check that out: by coincidence I just signed up for a one day workshop on STAN led by Michael this Friday …

  3. Yeah, in these problems the dataset is usually one time series containing a mixture of N deterministic/parametric signals from N planets plus some kind of stochastic noise process. A more detailed description of the algorithm is in http://arxiv.org/pdf/1411.3921v3.pdf

    I wonder what you think of the ‘phase change’ problem as a difficulty for thermodynamic methods, but not in principle for nested sampling? I live in fear of a posterior that contains a minute region of parameter space with a huge spike of likelihood!

    • Thanks, Ewan: this “phase transition” is a wee bit of a mystery to me. The way it is described, it is made of a highly concentrated spike on top of a rather flattish likelihood. It is hard to get an intuition as to why the simulation of points at random over the restricted likelihood levels would favour a visit of the spike region when using an imperfect method like MCMC. For instance, when simulating a Guassian mixture posterior distribution, there are funnels around the zero variance – mean as one observation points, funnels that go up to infinity and the nested sampler does not usually visit those funnels.

      • The slab and spike likelihood would be one case: in principle the nested sampler will keep on shrinking its restricted likelihood region until it lassos the spike—provided that the slab part has at least a slight gradient in all directions leading to the spike, so to speak. On the other hand the NS sampler might well reach its stopping condition while exploring the slab, so I’d hardly say it’s guaranteed in reality to succeed.

        Having played with some toy models (e.g. a row of positive or negative charged ‘atoms’ having log-likelihood proportional to the number of matched neighbours) I think there is an argument for running one chain on a powered up version of the posterior (e.g. L^10) during practical data analyses, just in case there’s a ‘phase’ to the likelihood that’s not yet been discovered.

      • Have you seen the most recent revision of Mike Betancourt’s Adiabatic Monte Carlo paper? I think his explanation in terms of metastabilities in the contact form might be an explanation.

        http://arxiv.org/abs/1405.3489

    • “I live in fear of a posterior that contains a minute region of parameter space with a huge spike of likelihood!”

      It’s much more common that the phase transition occurs at a higher temperature, and that will only affect marginal likelihoods. I’d bet there are many wrong marginal likelihoods in the literature because of phase transitions, but I doubt there are many incorrect posterior distributions. One example of an incorrect posterior distribution is this strange paper by Carlos Rodriguez, where he thinks we should all use Jeffreys priors: http://arxiv.org/abs/0709.1067 For his non-Jeffreys prior the only thing that failed was his MCMC run, which didn’t mix between the two phases.

      I think John Skilling’s obtuse writing style is to blame for people’s lack of understanding of these problems. If you read his 2006 paper in BA it’s mostly about phase transitions, yet many papers since then just use NS because they feel like it / it sounds cool.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s