## Particle learning [rejoinder]

Posted in R, Statistics, University life with tags , , , , , , , , , , on November 10, 2010 by xi'an

Following the posting on arXiv of the Statistical Science paper of Carvalho et al., and the publication by the same authors in Bayesian Analysis of Particle Learning for general mixtures I noticed on Hedibert Lopes’ website his rejoinder to the discussion of his Valencia 9 paper has been posted. Since the discussion involved several points made by members of the CREST statistics lab (and covered the mixture paper as much as the Valencia 9 paper), I was quite eager to read Hedie’s reply. Unsurprisingly, this rejoinder is however unlikely to modify my reservations about particle learning. The following is a detailed examination of the arguments found in the rejoinder but requires a preliminary reading of the above papers as well as our discussion.. Continue reading

## Parallel processing of independent Metropolis-Hastings algorithms

Posted in R, Statistics, University life with tags , , , , , , , , on October 12, 2010 by xi'an

With Pierre Jacob, my PhD student, and Murray Smith, from National Institute of Water and Atmospheric Research, Wellington, who actually started us on this project at the last and latest Valencia meeting, we have completed a paper on using parallel computing in independent Metropolis-Hastings algorithms. The paper is arXived and the abstract goes as follows:

In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis–Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis–Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, the techniques developed in this paper do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the Rao–Blackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert 91996), Atchadé and Perron (2005) and Douc and Robert (2010). We illustrate those improvement both on a toy normal example and on a classical probit regression model but insist on the fact that they are universally applicable.

I am quite excited about the results in this paper, which took advantage of (a) older works of mine on Rao-Blackwellisation, (b) Murray’s interests in costly likelihoods, and (c) our mutual excitement when hearing about GPU parallel possibilities from Chris Holmes’ talk in Valencia. (As well as directions drafted in an exciting session in Vancouver!) The (free) gains over standard independent Metropolis-Hastings estimates are equivalent to using importance sampling gains, while keeping the Markov structure of the original chain. Given that 100 or more parallel threads can be enhanced from current GPU cards, this is clearly a field with much potential! The graph below

gives the variance improvements brought by three Rao-Blackwell estimates taking advantage of parallelisation over the initial MCMC estimate (first entry) with the importance sampling estimate (last entry) using only 10 parallel threads.

## Multidimension bridge sampling (CoRe in CiRM [5])

Posted in Books, R, Statistics, University life with tags , , , , , on July 14, 2010 by xi'an

Since Bayes factor approximation is one of my areas of interest, I was intrigued by Xiao-Li Meng’s comments during my poster in Benidorm that I was using the “wrong” bridge sampling estimator when trying to bridge two models of different dimensions, based on the completion (for $\theta_2=(\mu,\sigma^2)$ and $\mu=\theta_1$ missing from the first model)

$B^\pi_{12}(x)= \dfrac{\displaystyle{\int\pi_1^*(\mu|\sigma^2){\tilde\pi}_1(\sigma^2|x) \alpha(\theta_2) {\pi}_2(\theta_2|x)\hbox{d}\theta_2}}{ \displaystyle{\int{\tilde\pi}_2(\theta_2|x)\alpha(\theta_2) \pi_1(\sigma^2|x)\hbox{d}\sigma^2 } \pi_1^*(\mu|\sigma^2) \hbox{d}\mu }\,.$

When revising the normal chapter of Bayesian Core,  here in CiRM, I thus went back to Xiao-Li’s papers on the topic to try to fathom what the “true” bridge sampling was in that case. In Meng and Schilling (2002, JASA), I found the following indication, “when estimating the ratio of normalizing constants with different dimensions, a good strategy is to bridge each density with a good approximation of itself and then apply bridge sampling to estimate each normalizing constant separately. This is typically more effective than to artificially bridge the two original densities by augmenting the dimension of the lower one”. I was unsure of the technique this (somehow vague) indication pointed at until I understood that it meant  introducing one artificial posterior distribution for each of the parameter spaces and processing each marginal likelihood as an integral ratio in itself. For instance, if $\eta_1(\theta_1)$ is an arbitrary normalised density on $\theta_1$, and $\alpha$ is an arbitrary function, we have the bridge sampling identity on $m_1(x)$:

$\int\tilde{\pi}_1(\theta_1|x) \,\text{d}\theta_1 = \dfrac{\displaystyle{\int \tilde{\pi}_1(\theta_1|x) \alpha(\theta_1) {\eta}_1(\theta_1)\,\text{d}\theta_1}}{\displaystyle{\int\eta_1(\theta_1) \alpha(\theta_1) \pi_1(\theta_1|x) \,\text{d}\theta_1}}$

Therefore, the optimal choice of $\alpha$ leads to the approximation

$\widehat m_1(x) = \dfrac{\displaystyle{\sum_{i=1}^N {\tilde\pi}_1(\theta^\eta_{1i}|x)\big/\left\{{m_1(x) \tilde\pi}_1(\theta^\eta_{1i}|x) + \eta(\theta^\eta_{1i})\right\}}}{\displaystyle{ \sum_{i=1}^{N} \eta(\theta_{1i}) \big/ \left\{{m_1(x) \tilde\pi}_1(\theta_{1i}|x) + \eta(\theta_{1i})\right\}}}$

when $\theta_{1i}\sim\pi_1(\theta_1|x)$ and $\theta^\eta_{1i}\sim\eta(\theta_1)$. More exactly, this approximation is replaced with an iterative version since it depends on the unknown $m_1(x)$. The choice of the density $\eta$ is obviously fundamental and it should be close to the true posterior $\pi_1(\theta_1|x)$ to guarantee good convergence approximation. Using a normal approximation to the posterior distribution of $\theta$ or a non-parametric approximation based on a sample from $\pi_1(\theta_1|\mathbf{x})$, or yet again an average of MCMC proposals are reasonable choices.

The boxplot above compares this solution of Meng and Schilling (2002, JASA), called double (because two pseudo-posteriors $\eta_1(\theta_1)$ and $\eta_2(\theta_2)$ have to be introduced), with Chen, Shao and Ibragim (2001) solution based on a single completion $\pi_1^*$ (using a normal centred at the estimate of the missing parameter, and with variance the estimate from the simulation), when testing whether or not the mean of a normal model with unknown variance is zero. The variabilities are quite comparable in this admittedly overly simple case. Overall, the performances of both extensions are obviously highly dependent on the choice of the completion factors, $\eta_1$ and $\eta_2$ on the one hand and $\pi_1^*$ on the other hand, . The performances of the first solution, which bridges both models via $\pi_1^*$, are bound to deteriorate as the dimension gap between those models increases. The impact of the dimension of the models is less keenly felt for the other solution, as the approximation remains local.

Posted in Statistics, University life with tags , , , , on June 23, 2010 by xi'an

Following discussions at CREST, we have contributed comments on the following papers

Bernardo, José M. (Universitat de València, Spain)
Integrated objective Bayesian estimation and hypothesis testing. [discussion]

Consonni, Guido (Università di Pavia, Italy)
On moment priors for Bayesian model choice with applications to directed acyclic graphs. [discussion]

Frühwirth-Schnatter, Sylvia (Johannes Kepler Universität Linz, Austria)
Bayesian variable selection for random intercept modeling of Gaussian and non-Gaussian data. [discussion]

Huber, Mark (Claremont McKenna College, USA)
Using TPA for Bayesian inference. [discussion]

Lopes, Hedibert (University of Chicago, USA)
Particle learning for sequential Bayesian computation. [discussion]

Polson, Nicholas (University of Chicago, USA)
Shrink globally, act locally: Sparse Bayesian regularization and prediction. [discussion]

Wilkinson, Darren (University of Newcastle, UK)
Parameter inference for stochastic kinetic models of bacterial gene regulation: a Bayesian approach to systems biology. [discussion]

(with a possible incoming update on Mark Huber’s comments if we manage to get the simulations running in due time).

## Vanilla Rao-Blackwellisation accepted

Posted in Statistics, University life with tags , , on June 13, 2010 by xi'an

The revision of our Vanilla Rao-Blackwellisation paper has been accepted by the Annals of Statistics as I was leaving for València 9. This is a very good news, indeed! Especially because I came back from València 9 with an idea on how to extend the Rao-Blackwellisation…

## València 9 papers on line

Posted in Books, Statistics, University life with tags , , on June 11, 2010 by xi'an

Just received this email from José Bernardo:

The pdf files of the Valencia 9 invited papers are now available online at the conference webpage, as a link placed by the author name in the V9 invited program list. These are the  last version sent to me by the author, and will be substituted by more current ones as they become available.

I remind you that  you are encouraged to submit written contributions to the discussion of any of these 24 papers even if you could not attend the meeting. Your discussions should be directly emailed by June 28th  to the author(s) of the invited papers, with a copy to me. I will also need the LaTeX source and the eps files of any figures used. Contributions should not exceed six typeset pages (including figures) for invited discussions, and three pages for contributed discussions.

This means anyone can send discussions on the papers presented at the meeting, to be published soon in the Valencia 9 proceedings by Oxford University Press. We are just out of a post-conference meeting with our students and colleagues here at CREST, where we discussed the invited papers by Ickstadt, Nicholls (actually, sadly not open to written discussions!), Meek, and Wilkinson . (On Monday, we plan to cover Dunson, Früwirth-Schnatter, Lopes, Polson, and Vanucci.)

## Impresiónes de València 9

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , on June 10, 2010 by xi'an

The València 9 meeting in Benidorm is now over, even for those who stay till the end of the party (!)… In retrospect, I found the scientific quality of this last meeting of the series quite high and I am thus sad this series comes to an end. This mythical gathering of “true believers” on a Valencianos beach town certainly had a charm not found in other meetings (even though I have no particular love of beaches, of beach towns or of cabarets) in that it brought people really together for a rather long time in an intense and sometime heated exchange of ideas. (This secluded perspective of course reinforced the caricatures of Bayesians as sectarians!) This was particularly true this time as the huge majority of people stayed in the same (awful) hotel. Also, the fact that there was no parallel sessions was a major factor to keep people together… (The fact that the afternoon sessions were administered by ISBA rather than the València 9 scientific committee had the drawback of sometimes producing similar talks.) In my personal view, there were somehow too many non-parametric and sparsity sessions/talks, but this follows the research trends in the community (after all in the 1994 meeting, there were also “too many” MCMC talks!) And the discussions from the floor were much more limited than in the earlier meetings (but most invited discussions were a clear added value to the talks). Maybe this is due to the growing Bayesian community. As in earlier editions, the poster sessions were a strong moment with the frustrating drawback of having too many posters in a single session to allow for a complete coverage (unless you were ready to stay up till 2am…) Again a consequence of the size of the audience. But it was a pleasure to see how Bayesian statistics was well and alive and how the community was bridging old-timers having attending all of the nine Valencia meetings with newcomers still writing their PhD. (Congrats to Emily Fox and to James Scott for their respective Savage awards!)

Darren Wilkinson also gives an overview of the “last Valencia meeting” on his blog. This post includes a detailed analysis of the GPU solution enthusiatically defended by Chris Holmes. Since I came back from the meeting with ideas towards parallel accelerations for MCMC algorithms, I will look carefully at his arguments.