Archive for reversible jump MCMC

a book and two chapters on mixtures

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 8, 2019 by xi'an

The Handbook of Mixture Analysis is now out! After a few years of planning, contacts, meetings, discussions about notations, interactions with authors, further interactions with late authors, repeating editing towards homogenisation, and a final professional edit last summer, this collection of nineteen chapters involved thirty-five contributors. I am grateful to all participants to this piece of work, especially to Sylvia Früwirth-Schnatter for being a driving force in the project and for achieving a much higher degree of homogeneity in the book than I expected. I would also like to thank Rob Calver and Lara Spieker of CRC Press for their boundless patience through the many missed deadlines and their overall support.

Two chapters which I co-authored are now available as arXived documents:

5. Gilles Celeux, Kaniav Kamary, Gertraud Malsiner-Walli, Jean-Michel Marin, and Christian P. Robert, Computational Solutions for Bayesian Inference in Mixture Models
7. Gilles Celeux, Sylvia Früwirth-Schnatter, and Christian P. Robert, Model Selection for Mixture Models – Perspectives and Strategies

along other chapters

1. Peter Green, Introduction to Finite Mixtures
8. Bettina Grün, Model-based Clustering
12. Isobel Claire Gormley and Sylvia Früwirth-Schnatter, Mixtures of Experts Models
13. Sylvia Kaufmann, Hidden Markov Models in Time Series, with Applications in Economics
14. Elisabeth Gassiat, Mixtures of Nonparametric Components and Hidden Markov Models
19. Michael A. Kuhn and Eric D. Feigelson, Applications in Astronomy

computational statistics and molecular simulation [18w5023]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on November 15, 2018 by xi'an

 I truly missed the gist of the first talk of the Wednesday morning of our X fertilisation workshop by Jianfeng Lu partly due to notations, although the topic very much correlated to my interests like path sampling, with an augmented version of HMC using an auxiliary indicator. And mentions made of BAOAB. Next, Marcello Pereyra spoke about Bayesian image analysis, with the difficulty of setting a prior on an image. In case of astronomical images there are motivations for an L¹ penalisation sparse prior. Sampling is an issue. Moreau-Yoshida proximal optimisation is used instead, in connection with our MCMC survey published in Stats & Computing two years ago. Transferability was a new concept for me, as introduced by Kerrie Mengersen (QUT), to extrapolate an estimated model to another system without using the posterior as a prior. With a great interlude about the crown of thorns starfish killer robot! Rather a prior determination based on historical data, in connection with recent (2018) Technometrics and Bayesian Analysis papers towards rejecting non-plausible priors. Without reading the papers (!), and before discussing the matter with Kerrie, here or in Marseille, I wonder at which level of precision this can be conducted. The use of summary statistics for prior calibration gave the approach an ABC flavour.

The hand-on session was Jonathan Mattingly’s discussion of gerrymandering reflecting on his experience at court! Hard to beat for an engaging talk reaching between communities. As it happens I discussed the original paper last year. Of course it was much more exciting to listen to Jonathan explaining his vision of the problem! Too bad I “had” to leave before the end for a [most enjoyable] rock climbing afternoon… To be continued at the dinner table! (Plus we got the complete explanation of the term gerrymandering, including this salamander rendering of the first identified as gerrymandered district!)

computational statistics and molecular simulation [18w5023]

Posted in Statistics with tags , , , , , , , , , , , , , , on November 13, 2018 by xi'an

This X fertilisation workshop Gabriel Stolz, Luke Bornn and myself organised towards reinforcing the interface between molecular dynamics and Monte Carlo statistical methods has now started! At the casa matematicà Oaxaca, the Mexican campus of BIRS, which is currently housed by a very nice hotel on the heights of Oaxaca. And after a fairly long flight for a large proportion of the participants. On the first day, Arthur Voter gave a fantastic “hand-on” review of molecular dynamics for material sciences, which was aimed at the statistician side of the audience and most helpful in my own understanding of the concepts and techniques at the source of HMC and PDMP algorithms. (Although I could not avoid a few mini dozes induced by jetlag.) Including the BAOAB version of HMC, which sounded to me like an improvement to investigate. The part on metastability, completed by a talk by Florian Maire, remained a wee bit mysterious [to me].

The shorter talks of the day all brought new perspectives and information to me (although they were definitely more oriented towards their “own” side of the audience than the hand-on lecture). For instance, Jesús María Sanz-Serna gave a wide ranging overview of numerical integrators and Tony Lelièvre presented a recent work on simulating measures supported by manifolds via an HMC technique constantly projecting over the manifold, with proper validation. (I had struggled with the paper this summer and this talk helped a lot.) There was a talk by Josh Fash on simulating implicit solvent models that mixed high-level programming and reversible jump MCMC, with an earlier talk by Yong Chen on variable dimension hidden Markov models that could have also alluded to reversible jump. Angela Bito talked about using ASIS (Ancillarity-sufficiency interweaving strategy) for improving the dynamics of an MCMC sampler associated with a spike & slab prior, the recentering-decentering cycle being always a sort of mystery to me [as to why it works better despite introducing multimodality in this case], and Gael Martin presented some new results on her on-going work with David Frazier about approximate Bayes with misspecified models, with the summary statistic being a score function that relates the work to the likelihood free approach of Bissiri et al.

normalising constants of G-Wishart densities

Posted in Books, Statistics with tags , , , , , , on June 28, 2017 by xi'an

Abdolreza Mohammadi, Hélène Massam, and Gérard Letac arXived last week a paper on a new approximation of the ratio of two normalising constants associated with two G-Wishart densities associated with different graphs G. The G-Wishart is the generalisation of the Wishart distribution by Alberto Roverato to the case when some entries of the matrix are equal to zero, which locations are associated with the graph G. While enjoying the same shape as the Wishart density, this generalisation does not enjoy a closed form normalising constant. Which leads to an intractable ratio of normalising constants when doing Bayesian model selection across different graphs.

Atay-Kayis and Massam (2005) expressed the ratio as a ratio of two expectations, and the current paper shows that this leads to an approximation of the ratio of normalising constants for a graph G against the graph G augmented by the edge e, equal to

Γ(½{δ+d}) / 2 √π Γ(½{δ+d+1})

where δ is the degree of freedom of the G-Wishart and d is the number of minimal paths of length 2 linking the two end points of e. This is remarkably concise and provides a fast approximation. (The proof is quite involved, by comparison.) Which can then be used in reversible jump MCMC. The difficulty is obviously in evaluating the impact of the approximation on the target density, as there is no manageable available alternative to calibrate the approximation. In a simulation example where such an alternative is available, the error is negligible though.

repulsive mixtures

Posted in Books, Statistics with tags , , , , , , , , on April 10, 2017 by xi'an

Fangzheng Xie and Yanxun Xu arXived today a paper on Bayesian repulsive modelling for mixtures. Not that Bayesian modelling is repulsive in any psychological sense, but rather that the components of the mixture are repulsive one against another. The device towards this repulsiveness is to add a penalty term to the original prior such that close means are penalised. (In the spirit of the sugar loaf with water drops represented on the cover of Bayesian Choice that we used in our pinball sampler, repulsiveness being there on the particles of a simulated sample and not on components.) Which means a prior assumption that close covariance matrices are of lesser importance. An interrogation I have has is was why empty components are not excluded as well, but this does not make too much sense in the Dirichlet process formulation of the current paper. And in the finite mixture version the Dirichlet prior on the weights has coefficients less than one.

The paper establishes consistency results for such repulsive priors, both for estimating the distribution itself and the number of components, K, under a collection of assumptions on the distribution, prior, and repulsiveness factors. While I have no mathematical issue with such results, I always wonder at their relevance for a given finite sample from a finite mixture in that they give an impression that the number of components is a perfectly estimable quantity, which it is not (in my opinion!) because of the fluid nature of mixture components and therefore the inevitable impact of prior modelling. (As Larry Wasserman would pound in, mixtures like tequila are evil and should likewise be avoided!)

The implementation of this modelling goes through a “block-collapsed” Gibbs sampler that exploits the latent variable representation (as in our early mixture paper with Jean Diebolt). Which includes the Old Faithful data as an illustration (for which a submission of ours was recently rejected for using too old datasets). And use the logarithm of the conditional predictive ordinate as  an assessment tool, which is a posterior predictive estimated by MCMC, using the data a second time for the fit.

SMC on a sequence of increasing dimension targets

Posted in Statistics with tags , , , , , , , , , on February 15, 2017 by xi'an

mixdirRichard Everitt and co-authors have arXived a preliminary version of a paper entitled Sequential Bayesian inference for mixture models and the coalescent using sequential Monte Carlo samplers with transformations. The central notion is an SMC version of the Carlin & Chib (1995) completion in the comparison of models in different dimensions. Namely to create auxiliary variables for each model in such a way that the dimension of the completed models are all the same. (Reversible jump MCMC à la Peter Green (1995) can also be interpreted this way, even though only relevant bits of the completion are used in the transitions.) I find the paper and the topic most interesting if only because it relates to earlier papers of us on population Monte Carlo. It also brought to my awareness the paper by Karagiannis and Andrieu (2013) on annealed reversible jump MCMC that I had missed at the time it appeared. The current paper exploits this annealed expansion in the devising of the moves. (Sequential Monte Carlo on a sequence of models with increasing dimension has been studied in the past.)

The way the SMC is described in the paper, namely, reweight-subsample-move, does not strike me as the most efficient as I would try to instead move-reweight-subsample, using a relevant move that incorporate the new model and hence enhance the chances of not rejecting.

One central application of the paper is mixture models with an unknown number of components. The SMC approach applied to this problem means creating a new component at each iteration t and moving the existing particles after adding the parameters of the new component. Since using the prior for this new part is unlikely to be at all efficient, a split move as in Richardson and Green (1997) can be considered, which brings back the dreaded Jacobian of RJMCMC into the picture! Here comes an interesting caveat of the method, namely that the split move forces a choice of the split component of the mixture. However, this does not appear as a strong difficulty, solved in the paper by auxiliary [index] variables, but possibly better solved by a mixture representation of the proposal, as in our PMC [population Monte Carlo] papers. Which also develop a family of SMC algorithms, incidentally. We found there that using a mixture representation of the proposal achieves a provable variance reduction.

“This puts a requirement on TSMC that the single transition it makes must be successful.”

As pointed by the authors, the transformation SMC they develop faces the drawback that a given model is only explored once in the algorithm, when moving to the next model. On principle, there would be nothing wrong in including regret steps, retracing earlier models in the light of the current one, since each step is an importance sampling step valid on its own right. But SMC also offers a natural albeit potentially high-varianced approximation to the marginal likelihood, which is quite appealing when comparing with an MCMC outcome. However, it would have been nice to see a comparison with alternative estimates of the marginal in the case of mixtures of distributions. I also wonder at the comparative performances of a dual approach that would be sequential in the number of observations as well, as in Chopin (2004) or our first population Monte Carlo paper (Cappé et al., 2005), since subsamples lead to tempered versions of the target and hence facilitate moves between models, being associated with flatter likelihoods.

Bayesian model selection without evidence

Posted in Books, Statistics, University life with tags , , , , , , , on September 20, 2016 by xi'an

“The new method circumvents the challenges associated with accurate evidence calculations by computing posterior odds ratios using Bayesian parameter estimation”

One paper leading to another, I had a look at Hee et al. 2015 paper on Bayes factor estimation. The “novelty” stands in introducing the model index as an extra parameter in a single model encompassing all models under comparison, the “new” parameterisation being in (θ,n) rather than in θ. With the distinction that the parameter θ is now made of the union of all parameters across all models. Which reminds us very much of Carlin and Chib (1995) approach to the problem. (Peter Green in his Biometrika (1995) paper on reversible jump MCMC uses instead a direct sum of parameter spaces.) The authors indeed suggest simulating jointly (θ,n) in an MCMC or nested sampling scheme. Rather than being updated by arbitrary transforms as in Carlin and Chib (1995) the useless parameters from the other models are kept constant… The goal being to estimate P(n|D) the marginal posterior on the model index, aka the posterior probability of model n.

Now, I am quite not certain keeping the other parameter constants is a valid move: given a uniform prior on n and an equally uniform proposal, the acceptance probability simplifies into the regular Metropolis-Hastings ratio for model n. Hence the move is valid within model n. If not, I presume the previous pair (θ⁰,n⁰) is repeated. Wait!, actually, this is slightly more elaborate: if a new value of n, m, is proposed, then the acceptance ratio involves the posteriors for both n⁰ and m, possibly only the likelihoods when the proposal is the prior. So the move will directly depend on the likelihood ratio in this simplified case, which indicates the scheme could be correct after all. Except that this neglects the measure theoretic subtleties that led to reversible jump symmetry and hence makes me wonder. In other words, it follows exactly the same pattern as reversible jump without the constraints of the latter… Free lunch,  anyone?!