Archive for October, 2011

Decision systems and nonstochastic randomness

Posted in Books, Statistics, University life with tags , , , , , on October 26, 2011 by xi'an

Thus the informativity of stochastic experiment turned out to depend on the Bayesian system and to coincide to within the scale factor with the previous “value of information”.” V. Ivanenko, Decision systems and nonstochastic randomness, p.208

This book, Decision systems and nonstochastic randomness, written by the Ukrainian researcher Victor Ivanenko, is related to decision theory and information theory, albeit with a statistical component as well. It however works at a fairly formal level and the reading is certainly not light. The randomness it address is the type formalised by Andreï Kolmogorov (also covered in the book Randomness through Computation I [rather negatively] reviewed a few months ago, inducing angry comments and scathing criticisms in the process). The terminology is slightly different from the usual one, but the basics are those of decision theory as in De Groot (1970). However, the tone quickly gets much more mathematical and the book lost me early in Chapter 3 (Indifferent uncertainty) on a casual reading. The following chapter on non-stochastic randomness reminded me of von Mises for its use of infinite sequences, and of the above book for its purpose, but otherwise offered an uninterrupted array of definitions and theorems that sounded utterly remote from statistical problems. After failing to make sense of the chapter on the informativity of experiment in Bayesian decision problems, I simply gave up… I thus cannot judge from this cursory reading whether or not the book is “useful in describing real situations of decision-making” (p.208). It just sounds very remote from my centres of interest. (Anyone interested by writing a review?)

Catching up faster by switching sooner

Posted in R, Statistics, University life with tags , , , , , , , , on October 26, 2011 by xi'an

Here is our discussion (with Nicolas Chopin) of the Read Paper of last Wednesday by T. van Erven, P. Grünwald and S. de Rooij (Centrum voor Wiskunde en Informatica, Amsterdam), entitled Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the Akaike information criterion–Bayesian information criterion dilemma. It is still available for written discussions, to be published in Series B. Even though the topic is quite tangential to our interests, the fact that the authors evolve in a Bayesian environment called for the following (my main contribution being in pointing out that the procedure is not Bayesian by failing to incorporate the switch in the predictive (6), hence using the same data for all models under competition…):

Figure 1 – Bayes factors of Model 2 vs.~Model 1 (gray line) and Model 3 vs.~Model 1 (dark line), plotted against the number of observations, i.e. of iterations, when comparing three stochastic volatility models; see Chopin et al. (2011) for full details.

This paper is an interesting attempt at a particularly important problem. We nonetheless believe more classical tools should be used instead if models are truly relevant in the inference led by the authors: Figure 1, reproduced from Chopin et al. (2011), plots [against time] the Bayes factors of Models 2 and 3 vs. Model 1, where all models are state-space models of increasing complexity, fitted to some real data. In this context, one often observes that more complex models need more time to “ascertain themselves”. On the other hand, even BMA based prediction is a very challenging computational problem (the only generic solution currently being the SMC² algorithm of the aforementioned paper), and we believe that the current proposed predictive strategy will remain too computationally expensive for practical use for nonlinear state-space models.

For other classes of models, since the provable methods put forward by this paper are based on “frozen strategies”, which are hard to defend from a modelling perspective, and since the more reasonable “basic switch” strategy seems to perform as well numerically, we would be curious to see how the proposed methods compare to predictive distributions obtained from genuine Bayesian models. A true change point model for instance would generate a coherent prediction strategy, which is not equivalent to the basic switch strategy. (Indeed, for one thing, the proposal made by the authors utilises the whole past to compute the switching probabilities, rather than allocating the proper portion of the data to the relevant model. In this sense, the proposal is “using the data [at least] twice” in a pseudo-Bayesian setting, similar to Aitkin’s, 1991.) More generally, the authors seem to focus on situations where the true generative process is a non-parametric class, and the completed models is an infinite sequence of richer and richer—but also of more and more complex—parametric models, which is a very sensible set-up in practice. Then, we wonder whether or not it would make more sense to set the prior distribution over the switch parameter s in such a way that (a) switches only occurs from one model to another model with greater complexity and (b) the number of switches is infinite.

For ABC readers, note the future Read Paper meeting on December 14 by Paul Fearnhead and Dennis Prangle.

Approximate Bayesian computational methods on-line

Posted in R, Statistics, University life with tags , , , , , , on October 25, 2011 by xi'an

Fig. 4 – Boxplots of the evolution [against ε] of ABC approximations to the Bayes factor. The representation is made in terms of frequencies of visits to [accepted proposals from] models MA(1) and MA(2) during an ABC simulation when ε corresponds to the 10,1,.1,.01% quantiles on the simulated autocovariance distances. The data is a time series of 50 points simulated from a MA(2) model. The true Bayes factor is then equal to 17.71, corresponding to posterior probabilities of 0.95 and 0.05 for the MA(2) and MA(1) models, resp.

The survey we wrote with Jean-Michel Marin, Pierre Pudlo, and Robin Ryder is now published in [the expensive] Statistics and Computing (on-line). Beside recycling a lot of Og posts on ABC, this paper has the (personal) appeal of giving us the first hint that all was not so rosy in terms of ABC model choice. I wonder whether or not it will be part of the ABC special issue.

Selecting statistics for [ABC] Bayesian model choice

Posted in Statistics, University life with tags , , , , , , , , , on October 25, 2011 by xi'an

At last, we have completed, arXived, and submitted our paper on the evaluation of summary statistics for Bayesian model choice! (I had presented preliminary versions at the recent workshops in New York and Zürich.) While broader in scope, the results obtained by Judith Rousseau, Jean-Michel Marin, Natesh Pillai, and myself bring an answer to the question raised by our PNAS paper on ABC model choice. Almost as soon as we realised the problem, that is, during MCMC’Ski in Utah, I talked with Judith about a possible classification of statistics in terms of their Bayes factor performances and we started working on that… While the idea of separating the mean behaviour of the statistics under both model came rather early, establishing a complete theoretical framework that validated this intuition took quite a while and the assumptions changed a few times around the summer. The simulations associated with the paper were straightforward in that (a) the setup had been suggested to us by a referee of our PNAS paper: compare normal and Laplace distributions with different summary statistics (inc. the median absolute deviation), (b) the theoretical results told us what to look for, and (c) they did very clearly exhibit the consistency and inconsistency of the Bayes factor/posterior probability predicted by the theory. Both boxplots shown here exhibit this agreement: when using (empirical) mean, median, and variance to compare normal and Laplace models, the posterior probabilities do not select the “true” model but instead aggregate near a fixed value. When using instead the median absolute deviation as summary statistic, the posterior probabilities concentrate near one or zero depending on whether or not the normal model is the true model.

The main result states that, under some “heavy-duty” assumptions, (a) if the “true” mean of the summary statistic can be recovered for both models under comparison, then the Bayes factor has the same asymptotic behaviour as n to the power -(d1 – d2)/2, irrespective of which one is the true model. (The dimensions d1 and  d2 are the effective dimensions of the asymptotic means of the summary statistic under both models.) Therefore, the Bayes factor always asymptotically selects the model having the smallest effective dimension and cannot be consistent. (b) if, instead, the “true” mean of the summary statistic cannot be represented in the other model, then the Bayes factor  is consistent. This means that, somehow, the best statistics to be used in an ABC approximation to a Bayes factor are ancillary statistics with different mean values under both models. Else, the summary statistic must have enough components to prohibit a parameter under the “wrong” model to meet the “true” mean of the summary statistic.

(As a striking coincidence, Hélene Massam and Géard Letac [re]posted today on arXiv a paper about the behaviour of the Bayes factor for contingency tables when the hyperparameter goes to zero, where they establish the consistency of the said Bayes factor under the sparser model. No Jeffreys-Lindley paradox in that case.)

from my office

Posted in pictures with tags , , on October 24, 2011 by xi'an

understanding computational Bayesian statistics: a reply from Bill Bolstad

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , on October 24, 2011 by xi'an

Bill Bolstad wrote a reply to my review of his book Understanding computational Bayesian statistics last week and here it is, unedited except for the first paragraph where he thanks me for the opportunity to respond, “so readers will see that the book has some good features beyond having a “nice cover”.” (!) I simply processed the Word document into an html output and put a Read More bar in the middle as it is fairly detailed. (As indicated at the beginning of my review, I am obviously biased on the topic: thus, I will not comment on the reply, lest we get into an infinite regress!)

The target audience for this book are upper division undergraduate students and first year graduate students in statistics whose prior statistical education has been mostly frequentist based. Many will have knowledge of Bayesian statistics at an introductory level similar to that in my first book, but some will have no previous Bayesian statistics course. Being self-contained, it will also be suitable for statistical practitioners without a background in Bayesian statistics.

The book aims to show that:

  1. Bayesian statistics makes different assumptions from frequentist statistics, and these differences lead to the advantages of the Bayesian approach.
  2. Finding the proportional posterior is easy, however finding the exact posterior distribution is difficult in practice, even numerically, especially for models with many parameters.
  3. Inferences can be based on a (random) sample from the posterior.
  4. There are methods for drawing samples from the incompletely known posterior.
  5. Direct reshaping methods become inefficient for models with large number of parameters.
  6. We can find a Markov chain that has the long-run distribution with the same shape as the posterior. A draw from this chain after it has run a long time can be considered a random draw from the posterior
  7. We have many choices in setting up a Markov chain Monte Carlo. The book shows the things that should be considered, and how problems can be detected from sample output from the chain.
  8. An independent Metropolis-Hastings chain with a suitable heavy-tailed candidate distribution will perform well, particularly for regression type models. The book shows all the details needed to set up such a chain.
  9. The Gibbs sampling algorithm is especially well suited for hierarchical models.

I am satisfied that the book has achieved the goals that I set out above. The title “Understanding Computational Bayesian Statistics” explains what this book is about. I want the reader (who has background in frequentist statistics) to understand how computational Bayesian statistics can be applied to models he/she is familiar with. I keep an up-to-date errata on the book website..The website also contains the computer software used in the book. This includes Minitab macros and R-functions. These were used because because they had good data analysis capabilities that could be used in conjunction with the simulations. The website also contains Fortran executables that are much faster for models containing more parameters, and WinBUGS code for the examples in the book. Continue reading

on the way to work

Posted in pictures, Travel, University life with tags , , , , , , , on October 23, 2011 by xi'an

Here are some sights I cross when biking to work. Of course, it is not always that bright and sunny! (Like three days ago when it rained and I fell from my bike and broke another rib…) The sculpture is in fact a phone booth made by Sophie Calle, who is the only one to know the number and who calls at random times… Never happened when I crossed the bridge, so far.And a terrific final today both in black and in white/blue! A game of anthology for both sides. So much the opposite of the semi-final when France should not have won against Wales. (It helps supporting both France and New Zealand! When Scotland is not playing…)

Follow

Get every new post delivered to your Inbox.

Join 908 other followers