## The winds of Winter [Bayesian prediction]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , on October 7, 2014 by xi'an A surprising entry on arXiv this morning: Richard Vale (from Christchurch, NZ) has posted a paper about the characters appearing in the yet hypothetical next volume of George R.R. Martin’s Song of ice and fire series, The winds of Winter [not even put for pre-sale on amazon!]. Using the previous five books in the series and the frequency of occurrence of characters’ point of view [each chapter being told as from the point of view of one single character], Vale proceeds to model the number of occurrences in a given book by a truncated Poisson model, $x_{it} \sim \mathcal{P}(\lambda_i)\text{ if }|t-\beta_i|<\tau_i$

in order to account for [most] characters dying at some point in the series. All parameters are endowed with prior distributions, including the terrible “large” hyperpriors familiar to BUGS users… Despite the code being written in R by the author. The modelling does not use anything but the frequencies of the previous books, so knowledge that characters like Eddard Stark had died is not exploited. (Nonetheless, the prediction gives zero chapter to this character in the coming volumes.) Interestingly, a character who seemingly died at the end of the last book is still given a 60% probability of having at least one chapter in  The winds of Winter [no spoiler here, but many in the paper itself!]. As pointed out by the author, the model as such does not allow for prediction of new-character chapters, which remains likely given Martin’s storytelling style! Vale still predicts 11 new-character chapters, which seems high if considering the series should be over in two more books [and an unpredictable number of years!].

As an aside, this paper makes use of the truncnorm R package, which I did not know and which is based on John Geweke’s accept-reject algorithm for truncated normals that I (independently) proposed a few years later.

## Bayesian computational tools

Posted in Statistics with tags , , , , , , on April 10, 2013 by xi'an I just arXived a survey entitled Bayesian computational tools in connection with a chapter the editors of the Annual Review of Statistics and Its Application asked me to write. (A puzzling title, I would have used Applications, not Application. Puzzling journal too: endowed with a prestigious editorial board, I wonder at the long-term perspectives of the review, once “all” topics have been addressed. At least, the “non-profit” aspect is respected: $100 for personal subscriptions and$250 for libraries, plus a one-year complimentary online access to volume 1.) Nothing terribly novel in my review, which illustrates some computational tool in some Bayesian settings, missing five or six pages to cover particle filters and sequential Monte Carlo. I however had fun with a double-exponential (or Laplace) example. This distribution indeed allows for a closed-form posterior distribution on the location parameter under a normal prior, which can be expressed as a mixture of truncated normal distributions. A mixture of (n+1) normal distributions for a sample of size n. We actually noticed this fact (which may already be well-known) when looking at our leading example in the consistent ABC choice paper, but it vanished from the appendix in the later versions. As detailed in the previous post, I also fought programming issues induced by this mixture, due to round-up errors in the most extreme components, until all approaches provided similar answers.

## painful truncnorm

Posted in R, Statistics with tags , , , , , , on April 9, 2013 by xi'an As I wanted to simulate truncated normals in a hurry, I coded the inverse cdf approach:

truncnorm=function(a,b,mu,sigma){
u=runif(1)
u=qnorm(pnorm((a-mu)/sigma)*(1-u)+u*pnorm((b-mu)/sigma))
return(mu+sigma*u)
}


instead of using my own accept-reject algorithm. Poor shortcut as the method fails when a and b are too far from μ

> truncnorm(1,2,3,4)
 -0.4912926
> truncnorm(1,2,13,1)
 Inf


So I introduced a control (and ended up wasting more time than if I had used my optimised accept-reject version!)

truncnorm=function(a,b,mu,sigma){
u=runif(1)
if (pnorm((b-mu)/sigma)-pnorm((a-mu)/sigma)>0){
u=qnorm(pnorm((a-mu)/sigma)*(1-u)+u*pnorm((b-mu)/sigma))
}else{
u=-qnorm(pnorm(-(a-mu)/sigma)*(1-u)-u*pnorm(-(b-mu)/sigma))}
return(mu+sigma*u)
}


As shown by the above, it works, even when a=1, b=2 and μ=20. However, this eventually collapses as well and I ended up installing the msm R package that includes rtnorm, an R function running my accept-reject version. (This package was written by Chris Jackson from the MRC Unit in Cambridge.)

## R for dummies

Posted in Books, R, Statistics, University life with tags , , , , , , , , on October 20, 2012 by xi'an Just saw this nice review of R for dummies. And thought after this afternoon class that my students in the simulation course at Paris-Dauphine could clearly benefit from reading it! They in fact had a terrible time simulating a truncated normal distribution by accept-reject. As they could not get the notion of normalising constants… (Yes, indeed, this very truncated normal distribution!) Even the validity of simulating a normal variate until the truncation is satisfied was not obvious to them and they took forever to program the corresponding code. Anyway, I will certainly order the book to check for myself (after receiving Genetics for dummies to make sure I use the right vocabulary, even though it is a bit too light in the end…)! And write a review for CHANCE if it generates enough interest in doing so…

## new typos in Monte Carlo Statistical Methods

Posted in Books, Statistics, University life with tags , , , , , , , , on December 7, 2011 by xi'an Thanks to Jay Bartroff for pointing out those typos after teaching from Monte Carlo Statistical Methods:

• On page 52, the gamma Ga(α, β) distribution uses β as a rate parameter while in other places it is a scale parameter, see, e.g. eqn (2.2) [correct, I must say the parameterisation of the gamma distribution is a pain and, while we tried to homogenise the notation with the second parameter being the rate, there are places like this where either the rate convention (as in the exponential distribution) or the scale convention (as in the generation) is the natural one…]
• Still on page 52, in Example 2.20, truncated normals are said to be discussed after Example 1.5, but they’re not. [There is a mention made of constrained parameters right after but this is rather cryptic!]
• On page 53, the ratio f/gα following the second displayed eqn is missing some terms [or, rather, the equality sign should be a proportional sign]
• Still on page 53, in eqn (2.11), the whole expression, rather than the square root, should be divided by 2 [yes, surprising typo given that it was derived correctly in the original paper!]
• On page 92, the exact constraint is that supp(g) actually needs only contain the intersection of supp(f) and supp(h), such as when approximating tail probabilities [correct if the importance sampling method is only used for a single function h, else the condition stands as is]
• On page 94, fY does not need that integral in the denominator [correct, we already corrected for the truncation by subtracting 4.5 in the exponential]
• On page 114, Problem 3.22, ωi is missing a factor of 1/n [correct]
• On page 218, in Example 6.24, P00=0 [indeed, our remark that Pxx>0 should start with x=1. Note that this does not change the aperiodicity, though]
• On page 282, the log α after the 2nd displayed equation should be eα [correct, this was pointed out in a previous list of typos, but clearly not corrected in the latest printing!]
• On page 282, in the 5th displayed equation there are missing factors π(α’|b)/π(α0|b) in rejection probability [actually, no, because, those terms being both proposals and priors, they cancel in the ratio. We could add a sentence to this effect to explain why, though.]
• On page 634, the reference page for exponential distribution is mistakenly given as 99 [wow, very thorough reading! There is an exponential distribution involved on page 99 but I agree this is not the relevant page…]