## probit posterior mean

Posted in Statistics, University life with tags , , , on March 9, 2012 by xi'an

In a recent arXiv report, Yuzo Maruyma shows that the posterior expectation of a probit parameter has an almost closed form (under a flat prior), namely

$\mathbb{E}[\beta|X,y] = (X^TX)^{-1} X^T\{2\text{diag}(y)-I_n\}\omega(X,y)$

where ω involves the integration of two quadratic forms over the n-dimensional unit sphere. Although this does not help directly with the MCMC derivation of the full posterior, this is an interesting lemma which shows a closed proximity with the standard least square estimate in linear regression.

## understanding computational Bayesian statistics: a reply from Bill Bolstad

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , on October 24, 2011 by xi'an

Bill Bolstad wrote a reply to my review of his book Understanding computational Bayesian statistics last week and here it is, unedited except for the first paragraph where he thanks me for the opportunity to respond, “so readers will see that the book has some good features beyond having a “nice cover”.” (!) I simply processed the Word document into an html output and put a Read More bar in the middle as it is fairly detailed. (As indicated at the beginning of my review, I am obviously biased on the topic: thus, I will not comment on the reply, lest we get into an infinite regress!)

The target audience for this book are upper division undergraduate students and first year graduate students in statistics whose prior statistical education has been mostly frequentist based. Many will have knowledge of Bayesian statistics at an introductory level similar to that in my first book, but some will have no previous Bayesian statistics course. Being self-contained, it will also be suitable for statistical practitioners without a background in Bayesian statistics.

The book aims to show that:

1. Bayesian statistics makes different assumptions from frequentist statistics, and these differences lead to the advantages of the Bayesian approach.
2. Finding the proportional posterior is easy, however finding the exact posterior distribution is difficult in practice, even numerically, especially for models with many parameters.
3. Inferences can be based on a (random) sample from the posterior.
4. There are methods for drawing samples from the incompletely known posterior.
5. Direct reshaping methods become inefficient for models with large number of parameters.
6. We can find a Markov chain that has the long-run distribution with the same shape as the posterior. A draw from this chain after it has run a long time can be considered a random draw from the posterior
7. We have many choices in setting up a Markov chain Monte Carlo. The book shows the things that should be considered, and how problems can be detected from sample output from the chain.
8. An independent Metropolis-Hastings chain with a suitable heavy-tailed candidate distribution will perform well, particularly for regression type models. The book shows all the details needed to set up such a chain.
9. The Gibbs sampling algorithm is especially well suited for hierarchical models.

I am satisfied that the book has achieved the goals that I set out above. The title “Understanding Computational Bayesian Statistics” explains what this book is about. I want the reader (who has background in frequentist statistics) to understand how computational Bayesian statistics can be applied to models he/she is familiar with. I keep an up-to-date errata on the book website..The website also contains the computer software used in the book. This includes Minitab macros and R-functions. These were used because because they had good data analysis capabilities that could be used in conjunction with the simulations. The website also contains Fortran executables that are much faster for models containing more parameters, and WinBUGS code for the examples in the book. Continue reading

## recent arXiv postings

Posted in Statistics, University life with tags , , , , , on October 17, 2011 by xi'an

Three interesting recent arXiv postings and not enough time to read them all and in the ‘Og bind them! (Of course, comments from readers welcome!)

Formulating a statistical inverse problem as one of inference in a Bayesian model has great appeal, notably for what this brings in terms of coherence, the interpretability of regularisation penalties, the integration of all uncertainties, and the principled way in which the set-up can be elaborated to encompass broader features of the context, such as measurement error, indirect observation, etc. The Bayesian formulation comes close to the way that most scientists intuitively regard the inferential task, and in principle allows the free use of subject knowledge in probabilistic model building. However, in some problems where the solution is not unique, for example in ill-posed inverse problems, it is important to understand the relationship between the chosen Bayesian model and the resulting solution. Taking emission tomography as a canonical example for study, we present results about consistency of the posterior distribution of the reconstruction, and a general method to study convergence of posterior distributions. To study efficiency of Bayesian inference for ill-posed linear inverse problems with constraint, we prove a version of the Bernstein-von Mises theorem for nonregular Bayesian models.

(Certainly unlikely to please the member of the audience in Zürich who questioned my Bayesian credentials for considering “true” models and consistency….)

Recently, Andrieu, Doucet and Holenstein (2010) introduced a general framework for using particle filters (PFs) to construct proposal kernels for Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models. However, the mixing of the resulting MCMC kernels can be quite sensitive, both to the number of particles used in the underlying PF and to the number of observations in the data. In this paper we suggest alternatives to the three PMCMC methods introduced in Andrieu et al. (2010), which are much more robust to a low number of particles as well as a large number of observations. We consider some challenging inference problems and show in a simulation study that, for problems where existing PMCMC methods require around 1000 particles, the proposed methods provide satisfactory results with as few as 5 particles.

(I have not read the paper enough in-depth to be critical, however “hard” figures like 5, or 10³, are always suspicious in that they cannot carry to the general case…)

In this paper we present an algorithm for rapid Bayesian analysis that combines the benefits of nested sampling and artificial neural networks. The blind accelerated multimodal Bayesian inference (BAMBI) algorithm implements the MultiNest package for nested sampling as well as the training of an artificial neural network (NN) to learn the likelihood function. In the case of computationally expensive likelihoods, this allows the substitution of a much more rapid approximation in order to increase significantly the speed of the analysis. We begin by demonstrating, with a few toy examples, the ability of a NN to learn complicated likelihood surfaces. BAMBI’s ability to decrease running time for Bayesian inference is then demonstrated in the context of estimating cosmological parameters from WMAP and other observations. We show that valuable speed increases are achieved in addition to obtaining NNs trained on the likelihood functions for the different model and data combinations. These NNs can then be used for an even faster follow-up analysis using the same likelihood and different priors. This is a fully general algorithm that can be applied, without any pre-processing, to other problems with computationally expensive likelihood functions.

(This is primarily an astronomy paper that uses a sample produced by the nested sampling algorithm MultiNest to build a neural network instead of the model likelihood. The algorithm thus requires the likelihood to be available at some stage.)

## understanding computational Bayesian statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on October 10, 2011 by xi'an

I have just finished reading this book by Bill Bolstad (University of Waikato, New Zealand) which a previous ‘Og post pointed out when it appeared, shortly after our Introducing Monte Carlo Methods with R. My family commented that the cover was nicer than those of my own books, which is true. Before I launch into a review, let me warn the ‘Og reader that, as an author of three books on computational Bayesian statistics, I cannot be very objective on the topic: I do favour the way we approached Bayesian computational methods and, after reading Bolstad’s Understanding computational Bayesian statistics, would still have written the books the way we did. Be warned, thus.

Understanding computational Bayesian statistics is covering the basics of Monte Carlo and (fixed dimension) Markov Chain Monte Carlo methods, with a fair chunk dedicated to prerequisites in Bayesian statistics and Markov chain theory. Even though I have only glanced at the table of contents of Bolstad’s Introduction to Bayesian Statistics [using almost the same nice whirl picture albeit in bronze rather than cobalt], it seems to me that the current book is the continuation of the earlier one, going beyond the Binomial, Poisson, and normal cases, to cover generalised linear models, via MCMC methods. (In this respect, it corresponds to Chapter 4 of Bayesian Core.) The book is associated with Minitab macros and an R package (written by James Curran), Bolstad2, in continuation of Bolstad, written for Introduction to Bayesian Statistics. Overall, the level of the book is such that it should be accessible to undergraduate students, MCMC methods being reduced to Gibbs, random walk and independent Metropolis-Hastings algorithms, and convergence assessments being done via autocorrelation graphs, the Gelman and Rubin (1992) intra-/inter-variance criterion, and a forward coupling device. The illustrative chapters cover logistic regression (Chap. 8), Poisson regression (Chap. 9), and normal hierarchical models (Chap. 10). Again, the overall feeling is that the book should be understandable to undergraduate students, even though it may make MCMC seem easier than it is by sticking to fairly regular models. In a sense, it is more a book of the [roaring MCMC] 90’s in that it does not incorporate advances from 2000 onwards (as seen from the reference list) like adaptive MCMC and the resurgence of importance sampling via particle systems and sequential Monte Carlo.

## Death sequence

Posted in Books, Statistics, University life with tags , , , , , , , , , on August 22, 2010 by xi'an