## drawing surface plots on the IR³ simplex

Posted in pictures, R, Statistics, University life with tags , , , , , , on October 18, 2013 by xi'an

As a result of a corridor conversation in Warwick, I started looking at distributions on the IR³ simplex,

$\{(x_1,x_2,x_3)\in\mathbb{R}_+^3;\ x_1+x_2+x_3=1\},$

and wanted to plot the density in a nice way. As I could not find a proper package on CRAN, the closer being the BMAmevt (for Bayesian Model Averaging for Multivariate Extremes) R package developed by a former TSI Master student, Anne Sabourin, I ended up programming the thing myself. And producing the picture above. Here is the code, for all it is worth:

# setting the limits
par(mar=c(0,0,0,0),bg="black")
plot(c(0,1),col="white",axes=F,xlab="",ylab="",
xlim=c(-1,1)*1.1/sqrt(2),ylim=c(-.1,sqrt(3/2))*1.1)

# density on a grid with NAs outside, as in image()
gride=matrix(NA,ncol=520,nrow=520)
ww3=ww2=seq(.01,.99,le=520)
for (i in 1:520){
cur=ww2[i];op=1-cur
for (j in 1:(521-i))
gride[i,j]=mydensity(c(cur,ww3[j],op-ww3[j]))
}

# preparing the graph
subset=(1:length(gride))[!is.na(gride)]
logride=log(gride[subset])
grida=(logride-min(logride))/diff(range(logride))
grolor=terrain.colors(250)[1+trunc(as.vector(grida)*250)]
iis=(subset%%520)+520*(subset==520)
jis=(subset%/%520)+1

# plotting the value of the (log-)density
# at each point of the grid
points(x=(ww3[jis]-ww2[iis])/sqrt(2),
y=(1-ww3[jis]-ww2[iis])/sqrt(2/3),
pch=20,col=grolor,cex=.3)


## a general framework for updating belief functions

Posted in Books, Statistics, University life with tags , , , , , , , , , on July 15, 2013 by xi'an

Pier Giovanni Bissiri, Chris Holmes and Stephen Walker have recently arXived the paper related to Sephen’s talk in London for Bayes 250. When I heard the talk (of which some slides are included below), my interest was aroused by the facts that (a) the approach they investigated could start from a statistics, rather than from a full model, with obvious implications for ABC, & (b) the starting point could be the dual to the prior x likelihood pair, namely the loss function. I thus read the paper with this in mind. (And rather quickly, which may mean I skipped important aspects. For instance, I did not get into Section 4 to any depth. Disclaimer: I wasn’t nor is a referee for this paper!)

The core idea is to stick to a Bayesian (hardcore?) line when missing the full model, i.e. the likelihood of the data, but wishing to infer about a well-defined parameter like the median of the observations. This parameter is model-free in that some degree of prior information is available in the form of a prior distribution. (This is thus the dual of frequentist inference: instead of a likelihood w/o a prior, they have a prior w/o a likelihood!) The approach in the paper is to define a “posterior” by using a functional type of loss function that balances fidelity to prior and fidelity to data. The prior part (of the loss) ends up with a Kullback-Leibler loss, while the data part (of the loss) is an expected loss wrt to l(THETASoEUR,x), ending up with the definition of a “posterior” that is

$\exp\{ -l(\theta,x)\} \pi(\theta)$

the loss thus playing the role of the log-likelihood.

I like very much the problematic developed in the paper, as I think it is connected with the real world and the complex modelling issues we face nowadays. I also like the insistence on coherence like the updating principle when switching former posterior for new prior (a point sorely missed in this book!) The distinction between M-closed M-open, and M-free scenarios is worth mentioning, if only as an entry to the Bayesian processing of pseudo-likelihood and proxy models. I am however not entirely convinced by the solution presented therein, in that it involves a rather large degree of arbitrariness. In other words, while I agree on using the loss function as a pivot for defining the pseudo-posterior, I am reluctant to put the same faith in the loss as in the log-likelihood (maybe a frequentist atavistic gene somewhere…) In particular, I think some of the choices are either hard or impossible to make and remain unprincipled (despite a call to the LP on page 7).  I also consider the M-open case as remaining unsolved as finding a convergent assessment about the pseudo-true parameter brings little information about the real parameter and the lack of fit of the superimposed model. Given my great expectations, I ended up being disappointed by the M-free case: there is no optimal choice for the substitute to the loss function that sounds very much like a pseudo-likelihood (or log thereof). (I thought the talk was more conclusive about this, I presumably missed a slide there!) Another great expectation was to read about the proper scaling of the loss function (since L and wL are difficult to separate, except for monetary losses). The authors propose a “correct” scaling based on balancing both faithfulness for a single observation, but this is not a completely tight argument (dependence on parametrisation and prior, notion of a single observation, &tc.)

The illustration section contains two examples, one of which is a full-size or at least challenging  genetic data analysis. The loss function is based on a logistic  pseudo-likelihood and it provides results where the Bayes factor is in agreement with a likelihood ratio test using Cox’ proportional hazard model. The issue about keeping the baseline function as unkown reminded me of the Robbins-Wasserman paradox Jamie discussed in Varanasi. The second example offers a nice feature of putting uncertainties onto box-plots, although I cannot trust very much the 95%  of the credibles sets. (And I do not understand why a unique loss would come to be associated with the median parameter, see p.25.)

Watch out: Tomorrow’s post contains a reply from the authors!

## ABC in 1984

Posted in Statistics with tags , , , , on November 9, 2009 by xi'an

“Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many models over one dataset” D. Rubin, Annals of Statistics, 1984

Jean-Louis Foulley sent me a 1984 paper by Don Rubin that details in no uncertain terms the accept-reject algorithm at the core of the ABC algorithm! Namely,

Generate $\theta\sim\pi(\theta)$;
Generate $x\sim f(x|\theta)$;
Accept $\theta$ if $x=x_0$

Obviously, ABC goes further by replacing the acceptance step with the tolerance condition

$d(x,x_0) < \epsilon$

but this early occurence is worth noticing nonetheless. It is also interesting to see that Don Rubin does not promote this simulation method in situations where the likelihood is not available but rather as an intuitive way to understanding posterior distributions from a frequentist perspective, because $\theta$‘s from the posterior are those that could have generated the observed data. (The issue of the zero probability of the exact equality between simulated and observed data is not dealt with in the paper, maybe because the notion of a “match” between simulated and observed data is not clearly defined.) Apart from this historical connection, I recommend the entire paper as providing a very compelling argument for practical Bayesianism!