## Two more handbook chapters

Posted in Books, Statistics with tags , , , , , , , , , on February 16, 2010 by xi'an

As mentioned in my earlier post, I had to write a revised edition to my chapter Bayesian Computational Methods in the Handbook of Computational Statistics (second edition), edited by J. Gentle, W. Härdle and Y. Mori. And in parallel I was asked for a second chapter in a handbook on risk analysis, Bayesian methods and expert elicitation, edited by Klaus Böckner. So, on Friday, I went over the first edition of this chapter of the Handbook of Computational Statistics and added the most recent developments I deemed important to mention, like ABC, as well as recent SMC and PMC algorithms, increasing the length by about ten pages. Simultaneously, Jean-Michel Marin completed my draft for the other handbook and I submitted both chapters, as well as arXived one and then the other.

It is somehow interesting (on a lazy blizzardly Sunday afternoon with nothing better to do!) to look for the differences between those chapters aiming at the same description of important computational techniques for Bayesian statistics (and based on the same skeleton). The first chapter is broader and, with its  60 pages, it functions as a (very) short book on the topic. Given that the first version was written in 2003, the focus is more on latent variables with mixture models being repeatedly used as examples. Reversible jump also stands preeminently. In my opinion, it reads well and could be used as a primary entry for a short formal course on computational methods. (Even though Introducing Monte Carlo Methods with R is presumably more appropriate for a short course.)

The second chapter started from the skeleton of the earlier version of the first chapter with the probit model as the benchmark example. I worked on a first draft during the last vacations and then Jean-Michel took over to produce this current version, where reversible jump has been removed and ABC introduced with greater details. In particular, we used a very special version of ABC with the probit model, resorting to the distance between the expectations of the binary observables, namely

$\sum_{j=1}^n (\Phi(x_j^\text{T} \beta) - \Phi(x_j^\text{T} \hat\beta) )^2,$

where $\hat\beta$ is the MLE of $\beta$ based on the observations, instead of the difference between the simulated and the observed binary observables

$\sum_{j=1}^n (y_j-y_j^0)^2$

which incorporates a useless randomness. With this choice, and when using for $\epsilon$ a .01 quantile, the difference with the true posterior on $\beta$ is very small, as shown by the figure (obtained for the Pima Indian dataset in R). Obviously, this stabilising trick only works in specific situations where a predictive of sorts can be computed.

## Vanilla Rao-Blackwellisation updated

Posted in Statistics with tags , , , , , on October 26, 2009 by xi'an

The vanilla Rao-Blackwellisation paper with Randal Douc has been updated, arXived, and resubmitted to the Annals, to include a more detailed study of the variance improvement brought by the Rao-Blackwellisation. In particular, we considered a probit modelling of the Pima Indian diabetes study, which is a standard benchmark that I also used in other papers and in the book Introducing Monte Carlo Methods with R with George Casella,. We also included the reference to the paper by Malefaki and Iliopoulos about Lemma 1, discussed in a previous post. The assessment of the variance improvement is examined in terms of the empirical variances of the individual terms

$h(z_j) \sum_{t=0}^\infty \prod_{m=1}^{t-1}\{1- \alpha(z_j,y_{jm})\}=h(z_j) \xi_j$

and

$h(z_j) \sum_{t=0}^\infty \prod_{m=1}^{t-1}\mathbb{I}\{u_{jm}\ge \alpha(z_j,y_{jm})\}=h(z_j) \mathfrak{n}_j$

involved in the two versions of the estimators, as explained in the original post (with a corrected typo due to the missing product in both terms!) and the variance ratio may be as small as two for some examples, including the Pima Indian diabetes. The improvement in the variances of $\xi_j$ against $\mathfrak{n}_j$ is most clearly seen in the boxplot below, in a exponential toy example where the acceptance rate is 1/13:

A phenomenon about boxplots that I cannot truly explain is that while the variance of the Rao-Blackwellised estimate of $\mathbb{E}[X]$ is 80% of the variance of the original estimator, the improvement is not that visible on the boxplot below of three samples of estimates (the middle one is our Rao-Blackwellised candidate, whose name vanished for being too large):

Now a 20% decrease in the variance means a mere 10% decrease in the standard deviation so this could agree with the difference in the upper bars of the boxplot… Since this is an example where the acceptance probability can be computed as well, I also added the optimal importance sampling estimator on the above boxplot and this confirms the impression that, while the variance decrease may be significant, this is not so clearly visible on the above boxplot.

We also took advantage of the fact that the quantity

$\alpha(z_j,y_{j0}) \sum_{t=0}^\infty \prod_{m=1}^{t-1}\{1- \alpha(z_j,y_{jm})\}$

where $y_{j0}$ is an additional simulation from the proposal $q(z_j,y)$ is a universal control variate for Metropolis-Hastings algorithms since its expectation is one. This means that in practice we can compute the regression coefficient $\hat\beta$ of $h(z_j)\xi_j$ on $\alpha(z_j,y_{j0})\xi_j$ and replace the average of the $h(z_j)\xi_j$‘s with the controled version

$\frac{1}{N} \sum_j h(z_j)\xi_j - \hat\beta(\alpha(z_j,y_{j0})\xi_j-1)$

with, again, improvement in the empirical variances that can reach a factor of 2.