Following the posting on arXiv of the Statistical Science paper of Carvalho et al., and the publication by the same authors in Bayesian Analysis of Particle Learning for general mixtures I noticed on Hedibert Lopes’ website his rejoinder to the discussion of his Valencia 9 paper has been posted. Since the discussion involved several points made by members of the CREST statistics lab (and covered the mixture paper as much as the Valencia 9 paper), I was quite eager to read Hedie’s reply. Unsurprisingly, this rejoinder is however unlikely to modify my reservations about particle learning. The following is a detailed examination of the arguments found in the rejoinder but requires a preliminary reading of the above papers as well as our discussion.. Continue reading
Archive for Valencia 9
With Pierre Jacob, my PhD student, and Murray Smith, from National Institute of Water and Atmospheric Research, Wellington, who actually started us on this project at the last and latest Valencia meeting, we have completed a paper on using parallel computing in independent Metropolis-Hastings algorithms. The paper is arXived and the abstract goes as follows:
In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis–Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis–Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, the techniques developed in this paper do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the Rao–Blackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert 91996), Atchadé and Perron (2005) and Douc and Robert (2010). We illustrate those improvement both on a toy normal example and on a classical probit regression model but insist on the fact that they are universally applicable.
I am quite excited about the results in this paper, which took advantage of (a) older works of mine on Rao-Blackwellisation, (b) Murray’s interests in costly likelihoods, and (c) our mutual excitement when hearing about GPU parallel possibilities from Chris Holmes’ talk in Valencia. (As well as directions drafted in an exciting session in Vancouver!) The (free) gains over standard independent Metropolis-Hastings estimates are equivalent to using importance sampling gains, while keeping the Markov structure of the original chain. Given that 100 or more parallel threads can be enhanced from current GPU cards, this is clearly a field with much potential! The graph below
gives the variance improvements brought by three Rao-Blackwell estimates taking advantage of parallelisation over the initial MCMC estimate (first entry) with the importance sampling estimate (last entry) using only 10 parallel threads.
Since Bayes factor approximation is one of my areas of interest, I was intrigued by Xiao-Li Meng’s comments during my poster in Benidorm that I was using the “wrong” bridge sampling estimator when trying to bridge two models of different dimensions, based on the completion (for and missing from the first model)
When revising the normal chapter of Bayesian Core, here in CiRM, I thus went back to Xiao-Li’s papers on the topic to try to fathom what the “true” bridge sampling was in that case. In Meng and Schilling (2002, JASA), I found the following indication, “when estimating the ratio of normalizing constants with different dimensions, a good strategy is to bridge each density with a good approximation of itself and then apply bridge sampling to estimate each normalizing constant separately. This is typically more effective than to artificially bridge the two original densities by augmenting the dimension of the lower one”. I was unsure of the technique this (somehow vague) indication pointed at until I understood that it meant introducing one artificial posterior distribution for each of the parameter spaces and processing each marginal likelihood as an integral ratio in itself. For instance, if is an arbitrary normalised density on , and is an arbitrary function, we have the bridge sampling identity on :
Therefore, the optimal choice of leads to the approximation
when and . More exactly, this approximation is replaced with an iterative version since it depends on the unknown . The choice of the density is obviously fundamental and it should be close to the true posterior to guarantee good convergence approximation. Using a normal approximation to the posterior distribution of or a non-parametric approximation based on a sample from , or yet again an average of MCMC proposals are reasonable choices.
The boxplot above compares this solution of Meng and Schilling (2002, JASA), called double (because two pseudo-posteriors and have to be introduced), with Chen, Shao and Ibragim (2001) solution based on a single completion (using a normal centred at the estimate of the missing parameter, and with variance the estimate from the simulation), when testing whether or not the mean of a normal model with unknown variance is zero. The variabilities are quite comparable in this admittedly overly simple case. Overall, the performances of both extensions are obviously highly dependent on the choice of the completion factors, and on the one hand and on the other hand, . The performances of the first solution, which bridges both models via , are bound to deteriorate as the dimension gap between those models increases. The impact of the dimension of the models is less keenly felt for the other solution, as the approximation remains local.
Bernardo, José M. (Universitat de València, Spain)
Integrated objective Bayesian estimation and hypothesis testing. [discussion]
Consonni, Guido (Università di Pavia, Italy)
On moment priors for Bayesian model choice with applications to directed acyclic graphs. [discussion]
Frühwirth-Schnatter, Sylvia (Johannes Kepler Universität Linz, Austria)
Bayesian variable selection for random intercept modeling of Gaussian and non-Gaussian data. [discussion]
Lopes, Hedibert (University of Chicago, USA)
Particle learning for sequential Bayesian computation. [discussion]
Polson, Nicholas (University of Chicago, USA)
Shrink globally, act locally: Sparse Bayesian regularization and prediction. [discussion]
Wilkinson, Darren (University of Newcastle, UK)
Parameter inference for stochastic kinetic models of bacterial gene regulation: a Bayesian approach to systems biology. [discussion]
(with a possible incoming update on Mark Huber’s comments if we manage to get the simulations running in due time).