## trans-dimensional nested sampling and a few planets

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , on March 2, 2015 by xi'an

This morning, in the train to Dauphine (train that was even more delayed than usual!), I read a recent arXival of Brendon Brewer and Courtney Donovan. Entitled Fast Bayesian inference for exoplanet discovery in radial velocity data, the paper suggests to associate Matthew Stephens’ (2000)  birth-and-death MCMC approach with nested sampling to infer about the number N of exoplanets in an exoplanetary system. The paper is somewhat sparse in its description of the suggested approach, but states that the birth-date moves involves adding a planet with parameters simulated from the prior and removing a planet at random, both being accepted under a likelihood constraint associated with nested sampling. I actually wonder if this actually is the birth-date version of Peter Green’s (1995) RJMCMC rather than the continuous time birth-and-death process version of Matthew…

“The traditional approach to inferring N also contradicts fundamental ideas in Bayesian computation. Imagine we are trying to compute the posterior distribution for a parameter a in the presence of a nuisance parameter b. This is usually solved by exploring the joint posterior for a and b, and then only looking at the generated values of a. Nobody would suggest the wasteful alternative of using a discrete grid of possible a values and doing an entire Nested Sampling run for each, to get the marginal likelihood as a function of a.”

This criticism is receivable when there is a huge number of possible values of N, even though I see no fundamental contradiction with my ideas about Bayesian computation. However, it is more debatable when there are a few possible values for N, given that the exploration of the augmented space by a RJMCMC algorithm is often very inefficient, in particular when the proposed parameters are generated from the prior. The more when nested sampling is involved and simulations are run under the likelihood constraint! In the astronomy examples given in the paper, N never exceeds 15… Furthermore, by merging all N’s together, it is unclear how the evidences associated with the various values of N can be computed. At least, those are not reported in the paper.

The paper also omits to provide the likelihood function so I do not completely understand where “label switching” occurs therein. My first impression is that this is not a mixture model. However if the observed signal (from an exoplanetary system) is the sum of N signals corresponding to N planets, this makes more sense.

## the intelligent-life lottery

Posted in Books, Kids with tags , , , , , , , on August 24, 2014 by xi'an

In a theme connected with one argument in Dawkins’ The God Delusion, The New York Time just published a piece on the 20th anniversary of the debate between Carl Sagan and Ernst Mayr about the likelihood of the apparition of intelligent life. While 20 years ago, there was very little evidence if any of the existence of Earth-like planets, the current estimate is about 40 billions… The argument against the high likelihood of other inhabited planets is that the appearance of life on Earth is an accumulation of unlikely events. This is where the paper goes off-road and into the ditch, in my opinion, as it makes the comparison of the emergence of intelligent (at the level of human) life to be “as likely as if a Powerball winner kept buying tickets and — round after round — hit a bigger jackpot each time”. The later having a very clearly defined probability of occurring. Since “the chance of winning the grand prize is about one in 175 million”. The paper does not tell where the assessment of this probability can be found for the emergence of human life and I very much doubt it can be justified. Given the myriad of different species found throughout the history of evolution on Earth, some of which evolved and many more which vanished, I indeed find it hard to believe that evolution towards higher intelligence is the result of a basically zero probability event. As to conceive that similar levels of intelligence do exist on other planets, it also seems more likely than not that life took on average the same span to appear and to evolve and thus that other inhabited planets are equally missing means to communicate across galaxies. Or that the signals they managed to send earlier than us have yet to reach us. Or Earth a long time after the last form of intelligent life will have vanished…

## Harmonic means, again

Posted in Statistics, University life with tags , , , , , , , , on January 3, 2012 by xi'an

Over the non-vacation and the vacation breaks of the past weeks, I skipped a lot of arXiv postings. This morning, I took a look at “Probabilities of exoplanet signals from posterior samplings” by Mikko Tuomi and Hugh R. A. Jones. This is a paper to appear in Astronomy and Astrophysics, but the main point [to me] is to study a novel approximation to marginal likelihood. The authors propose what looks at first as defensive sampling: given a likelihood f(x|θ) and a corresponding Markov chaini), the approximation is based on the following importance sampling representation

$\hat m(x) = \sum_{i=h+1}^N \dfrac{f(x|\theta_i)}{(1-\lambda) f(x|\theta_i) + \lambda f(x|\theta_{i-h})}\Big/$

$\sum_{i=h+1}^N \dfrac{1}{(1-\lambda) f(x|\theta_i) + \lambda f(x|\theta_{i-h})}$

This is called a truncated posterior mixture approximation and, under closer scrutiny, it is not defensive sampling. Indeed the second part in the denominators does not depend on the parameter θi, therefore, as far as importance sampling is concerned, this is a constant (if random) term! The authors impose a bounded parameter space for this reason, however I do not see why such an approximation is converging. Except when λ=0, of course, which brings us back to the original harmonic mean estimator. (Properly rejected by the authors for having a very slow convergence. Or, more accurately, generally no stronger convergence than the law of large numbers.)  Furthermore, the generic importance sampling argument does not work here since, if

$g(\theta) \propto (1-\lambda) \pi(\theta|x) + \lambda \pi(\theta_{i-h}|x)$

is the importance function, the ratio

$\dfrac{\pi(\theta_i)f(x|\theta_i)}{(1-\lambda) \pi(\theta|x) + \lambda \pi(\theta_{i-h}|x)}$

does not simplify… I do not understand either why the authors compare Bayes factors approximations based on this technique, on the harmonic mean version or on Chib and Jeliazkov’s (2001) solution with both DIC and AIC, since the later are not approximations to the Bayes factors. I am therefore quite surprised at the paper being accepted for publication, given that the numerical evaluation shows the impact of the coefficient λ does not vanish with the number of simulations. (Which is logical given the bias induced by the additional term.)