Archive for birth-and-death process

SMC on a sequence of increasing dimension targets

Posted in Statistics with tags , , , , , , , , , on February 15, 2017 by xi'an

mixdirRichard Everitt and co-authors have arXived a preliminary version of a paper entitled Sequential Bayesian inference for mixture models and the coalescent using sequential Monte Carlo samplers with transformations. The central notion is an SMC version of the Carlin & Chib (1995) completion in the comparison of models in different dimensions. Namely to create auxiliary variables for each model in such a way that the dimension of the completed models are all the same. (Reversible jump MCMC à la Peter Green (1995) can also be interpreted this way, even though only relevant bits of the completion are used in the transitions.) I find the paper and the topic most interesting if only because it relates to earlier papers of us on population Monte Carlo. It also brought to my awareness the paper by Karagiannis and Andrieu (2013) on annealed reversible jump MCMC that I had missed at the time it appeared. The current paper exploits this annealed expansion in the devising of the moves. (Sequential Monte Carlo on a sequence of models with increasing dimension has been studied in the past.)

The way the SMC is described in the paper, namely, reweight-subsample-move, does not strike me as the most efficient as I would try to instead move-reweight-subsample, using a relevant move that incorporate the new model and hence enhance the chances of not rejecting.

One central application of the paper is mixture models with an unknown number of components. The SMC approach applied to this problem means creating a new component at each iteration t and moving the existing particles after adding the parameters of the new component. Since using the prior for this new part is unlikely to be at all efficient, a split move as in Richardson and Green (1997) can be considered, which brings back the dreaded Jacobian of RJMCMC into the picture! Here comes an interesting caveat of the method, namely that the split move forces a choice of the split component of the mixture. However, this does not appear as a strong difficulty, solved in the paper by auxiliary [index] variables, but possibly better solved by a mixture representation of the proposal, as in our PMC [population Monte Carlo] papers. Which also develop a family of SMC algorithms, incidentally. We found there that using a mixture representation of the proposal achieves a provable variance reduction.

“This puts a requirement on TSMC that the single transition it makes must be successful.”

As pointed by the authors, the transformation SMC they develop faces the drawback that a given model is only explored once in the algorithm, when moving to the next model. On principle, there would be nothing wrong in including regret steps, retracing earlier models in the light of the current one, since each step is an importance sampling step valid on its own right. But SMC also offers a natural albeit potentially high-varianced approximation to the marginal likelihood, which is quite appealing when comparing with an MCMC outcome. However, it would have been nice to see a comparison with alternative estimates of the marginal in the case of mixtures of distributions. I also wonder at the comparative performances of a dual approach that would be sequential in the number of observations as well, as in Chopin (2004) or our first population Monte Carlo paper (Cappé et al., 2005), since subsamples lead to tempered versions of the target and hence facilitate moves between models, being associated with flatter likelihoods.

trans-dimensional nested sampling and a few planets

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , on March 2, 2015 by xi'an

This morning, in the train to Dauphine (train that was even more delayed than usual!), I read a recent arXival of Brendon Brewer and Courtney Donovan. Entitled Fast Bayesian inference for exoplanet discovery in radial velocity data, the paper suggests to associate Matthew Stephens’ (2000)  birth-and-death MCMC approach with nested sampling to infer about the number N of exoplanets in an exoplanetary system. The paper is somewhat sparse in its description of the suggested approach, but states that the birth-date moves involves adding a planet with parameters simulated from the prior and removing a planet at random, both being accepted under a likelihood constraint associated with nested sampling. I actually wonder if this actually is the birth-date version of Peter Green’s (1995) RJMCMC rather than the continuous time birth-and-death process version of Matthew…

“The traditional approach to inferring N also contradicts fundamental ideas in Bayesian computation. Imagine we are trying to compute the posterior distribution for a parameter a in the presence of a nuisance parameter b. This is usually solved by exploring the joint posterior for a and b, and then only looking at the generated values of a. Nobody would suggest the wasteful alternative of using a discrete grid of possible a values and doing an entire Nested Sampling run for each, to get the marginal likelihood as a function of a.”

This criticism is receivable when there is a huge number of possible values of N, even though I see no fundamental contradiction with my ideas about Bayesian computation. However, it is more debatable when there are a few possible values for N, given that the exploration of the augmented space by a RJMCMC algorithm is often very inefficient, in particular when the proposed parameters are generated from the prior. The more when nested sampling is involved and simulations are run under the likelihood constraint! In the astronomy examples given in the paper, N never exceeds 15… Furthermore, by merging all N’s together, it is unclear how the evidences associated with the various values of N can be computed. At least, those are not reported in the paper.

The paper also omits to provide the likelihood function so I do not completely understand where “label switching” occurs therein. My first impression is that this is not a mixture model. However if the observed signal (from an exoplanetary system) is the sum of N signals corresponding to N planets, this makes more sense.

structure and uncertainty, Bristol, Sept. 27

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on September 28, 2012 by xi'an

The last sessions at the SuSTain workshop. were equally riveting but I alas had to leave early to get a noon flight—as it happens, while I expected to get home early enough to work, run, cook, and do maths with my daughter, my taxi got stuck in an endless traffic jam and I only had time for the maths!—, hence missing the talks by Chris Holmes—second time after Kyoto!—, Sofia Massa, and Arnoldo Frigessi… I am glad I managed to get Michael Newton’s and Forrest Crawford’s talks, though, as Michael presented a highly pedagogical entry to computational concepts related to system biology (a potential candidate for an MCMSki IV talk?) and Forrest discussed some birth-and-death processes, including the Yule process, that allowed for closed form expressions of their Laplace transform via continued fractions. (Continued fractions, one of my favourite mathematical objects!!! Rarely appearing in statistics, though…) I have to check on Forrest’s recent papers to understand how widely this approach applies to philogenetic trees, but this opens a fairly interesting alternative to ABC!

This was a highly enjoyable meeting, first and foremost due to the quality of the talks and of their scheduling, but also by the pleasure of seeing again many friends of many years—notice how I carefully avoided using “old friends”!—, by the relaxed and open atmosphere of the workshop—in the terrific location of Goldney Hall—and of course of unofficially celebrating Peter Green’s deeds and contributions to the field, the profession, and the statistics group in Bristol! Deeds and contributions so far, as I am sure he will keep contributing in many ways in the coming years and decades, as already shown by his committed involvement in the very recent creation of BayesComp. I thus most gladly join the other participants of this workshop both to thank him most sincerely for those many and multifaceted contributions and to wish him all the best for those coming decades!

As an aside, I also enjoyed being “back” in Bristol once again, as I do like the city, the surrounding Somerset countryside, the nearby South Wales, and the wide running possibilities (from the Downs to the Mendip Hills!). While I sampled many great hotels in Bristol and Clifton over the years, I now rank the Avon Gorges Hotel where I stayed this time quite high in the list, both for its convenient (running!) location and its top-quality facilities (incl. high-speed WiFi!)

CoRe in CiRM [end]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on July 18, 2010 by xi'an

Back home after those two weeks in CiRM for our “research in pair” invitation to work on the new edition of Bayesian Core, I am very grateful for the support we received from CiRM and through it from SMF and CNRS. Being “locked” away in such a remote place brought a considerable increase in concentration and decrease in stress levels. Although I was planning for more, we have made substantial advances on five chapters of the book (out of nine), including a completely new chapter (Chapter 8) on hierarchical models and a thorough rewriting of the normal chapter (Chapter 2), which along with Chapter 1 (largely inspired from  Chapter 1 of Introducing Monte Carlo Methods with R, itself inspired from the first edition of Bayesian Core,!). is nearly done. Chapter 9 on image processing is also quite close from completion, with just the result of a batch simulation running on the Linux server in Dauphine to include in the ABC section. As the only remaining major change is the elimination of reversible jump from the mixture chapter (to be replaced with Chib’s approximation) and from the time-series chapter (to be simplified into a birth-and-death process). Going back to the CiRM environment, I think we were lucky to come during the vacation season as there is hardly anyone on the campus, which means no car and no noise. The (good) feeling of remoteness is not as extreme as in Oberwolfach, but it is truly a quality environment. Besides, being able to work 24/7 in the math library is a major plus. as we could go and grab any reference we needed to check. (Presumably, CiRM is lacking in terms of statistics books, compared with Oberwolfach, still providing most of the references we were looking for.) At last, the freedom to walk right out of the Centre into the national park for a run, a climb or even a swim (in Morgiou, rather than Sugiton) makes working there very tantalising indeed! I thus dearly hope I can enjoy again this opportunity in a near future…

A Vanilla Rao-Blackwellisation (comments)

Posted in Statistics with tags , , , , , on August 26, 2009 by xi'an

One of the authors of “On convergence of importance sampling and other properly weighted samples to the target distribution” by S. Malefaki and G. Iliopoulos, sent me their paper (now published in JSPI, 2008, pp. 1210-1225) to point out the connection with our Vanilla Rao-Blackwellisation paper. There is indeed a link in that those authors also exploit the sequence of accepted values in an MCMC sequence to build up geometric weights based on the distribution of those accepted rv’s. The paper also relates more strongly to the series of papers published by Jun Liu and coauthors in JASA in the early 2000’s about random importance weights, and even more to the birth-and-death jump processes introduced by Brian Ripley in his 1987 simulation book, and studied in Geyer and Møller (1994), Grenander and Miller (1994) and Phillips and Smith (1996) that led to the birth-and-death MCMC approach of Mathew Stephens in his thesis and 2000 Annals paper. As later analysed in our 2003 Series B paper, this jump process approach is theoretically valid but may lead to difficulties in the implementation stage. The first one is that each proposed value is accepted, albeit briefly and thus that, with proposals that have a null recurrent or a transient behaviour, it may take “forever” to go to infinity and back. The second one is that the perspective offered by this representation—which in the case of the standard Metropolis algorithm does not involve any modification—gives a vision of Metropolis algorithms as a rough version of an importance sampling algorithm. While this somehow is also the case for our Vanilla paper, the whole point of using a Metropolis or a Gibbs algorithm is exactly to avoid picking an importance sampling distribution in complex settings because they are almost necessarily inefficient and instead exploit some features of the target to build the proposals. (This is obviously a matter of perspective on the presentation of the analysis in the above paper, nothing’s being wrong with its mathematics.)