**J**onathan Harrison and Ruth Baker (Oxford University) arXived this morning a paper on the optimal combination of summaries for ABC in the sense of deriving the proper weights in an Euclidean distance involving all the available summaries. The idea is to find the weights that lead to the maximal distance between prior and posterior, in a way reminiscent of Bernardo’s (1979) maximal information principle. Plus a sparsity penalty à la Lasso. The associated algorithm is sequential in that the weights are updated at each iteration. The paper does not get into theoretical justifications but considers instead several examples with limited numbers of both parameters and summary statistics. Which may highlight the limitations of the approach in that handling (and eliminating) a large number of parameters may prove impossible this way, when compared with optimisation methods like random forests. Or summary-free distances between empirical distributions like the Wasserstein distance.

## Archive for posterior distribution

## automated ABC summary combination

Posted in Books, pictures, Statistics, University life with tags ABC, José Miguel Bernardo, Lasso, posterior distribution, semi-automatic ABC, summary statistics, University of Oxford, Wasserstein distance on March 16, 2017 by xi'an## MAP as Bayes estimators

Posted in Books, Kids, Statistics with tags Bayesian inference, counterexample, loss function, MAP estimators, posterior distribution on November 30, 2016 by xi'an**R**obert Bassett and Julio Deride just arXived a paper discussing the position of MAPs within Bayesian decision theory. A point I have discussed extensively on the ‘Og!

“…we provide a counterexample to the commonly accepted notion of MAP estimators as a limit of Bayes estimators having 0-1 loss.”

The authors mention The Bayesian Choice stating this property without further precautions and I completely agree to being careless in this regard! The difficulty stands with the limit of the maximisers being not necessarily the maximiser of the limit. The paper includes an example to this effect, with a prior as above, associated with a sampling distribution that does not depend on the parameter. The sufficient conditions proposed therein are that the posterior density is almost surely proper or quasiconcave.

This is a neat mathematical characterisation that cleans this “folk theorem” about MAP estimators. And for which the authors are to be congratulated! However, I am not very excited by the limiting property, whether it holds or not, as I have difficulties conceiving the use of a sequence of losses in a mildly realistic case. I rather prefer the alternate characterisation of MAP estimators by Burger and Lucka as proper Bayes estimators under another type of loss function, albeit a rather artificial one.

## drawing surface plots on the IR³ simplex

Posted in pictures, R, Statistics, University life with tags Bayesian inference, image(), posterior distribution, R, simplex, surface, terrain.colors() on October 18, 2013 by xi'an**A**s a result of a corridor conversation in Warwick, I started looking at distributions on the IR³ simplex,

and wanted to plot the density in a nice way. As I could not find a proper package on CRAN, the closer being the BMAmevt (for *Bayesian Model Averaging for Multivariate Extremes*) R package developed by a former TSI Master student, Anne Sabourin, I ended up programming the thing myself. And producing the picture above. Here is the code, for all it is worth:

# setting the limits par(mar=c(0,0,0,0),bg="black") plot(c(0,1),col="white",axes=F,xlab="",ylab="", xlim=c(-1,1)*1.1/sqrt(2),ylim=c(-.1,sqrt(3/2))*1.1) # density on a grid with NAs outside, as in image() gride=matrix(NA,ncol=520,nrow=520) ww3=ww2=seq(.01,.99,le=520) for (i in 1:520){ cur=ww2[i];op=1-cur for (j in 1:(521-i)) gride[i,j]=mydensity(c(cur,ww3[j],op-ww3[j])) } # preparing the graph subset=(1:length(gride))[!is.na(gride)] logride=log(gride[subset]) grida=(logride-min(logride))/diff(range(logride)) grolor=terrain.colors(250)[1+trunc(as.vector(grida)*250)] iis=(subset%%520)+520*(subset==520) jis=(subset%/%520)+1 # plotting the value of the (log-)density # at each point of the grid points(x=(ww3[jis]-ww2[iis])/sqrt(2), y=(1-ww3[jis]-ww2[iis])/sqrt(2/3), pch=20,col=grolor,cex=.3)

## a general framework for updating belief functions

Posted in Books, Statistics, University life with tags ABC, Bayesian inference, foundations, Genetics, Kullback, likelihood-free methods, loss functions, median, posterior distribution, scaling on July 15, 2013 by xi'an**P**ier Giovanni Bissiri, Chris Holmes and Stephen Walker have recently arXived the paper related to Sephen’s talk in London for Bayes 250. When I heard the talk (of which some slides are included below), my interest was aroused by the facts that (a) the approach they investigated could start from a statistics, rather than from a full model, with obvious implications for ABC, & (b) the starting point could be the dual to the prior x likelihood pair, namely the loss function. I thus read the paper with this in mind. (And rather quickly, which may mean I skipped important aspects. For instance, I did not get into Section 4 to any depth. Disclaimer: *I wasn’t nor is a referee for this paper!*)

**T**he core idea is to stick to a Bayesian (hardcore?) line when missing the full model, i.e. the likelihood of the data, but wishing to infer about a well-defined parameter like the median of the observations. This parameter is model-free in that some degree of prior information is available in the form of a prior distribution. (This is thus the dual of frequentist inference: instead of a likelihood w/o a prior, they have a prior w/o a likelihood!) The approach in the paper is to define a “posterior” by using a functional type of loss function that balances fidelity to prior and fidelity to data. The prior part (of the loss) ends up with a Kullback-Leibler loss, while the data part (of the loss) is an expected loss wrt to l(THETASoEUR,x), ending up with the definition of a “posterior” that is

the loss thus playing the role of the log-likelihood.

**I** like very much the problematic developed in the paper, as I think it is connected with the real world and the complex modelling issues we face nowadays. I also like the insistence on coherence like the updating principle when switching former posterior for new prior (a point sorely missed in this book!) The distinction between M-closed M-open, and M-free scenarios is worth mentioning, if only as an entry to the Bayesian processing of pseudo-likelihood and proxy models. I am however not entirely convinced by the solution presented therein, in that it involves a rather large degree of arbitrariness. In other words, while I agree on using the loss function as a pivot for defining the pseudo-posterior, I am reluctant to put the same faith in the loss as in the log-likelihood (maybe a frequentist atavistic gene somewhere…) In particular, I think some of the choices are either hard or impossible to make and remain unprincipled (despite a call to the LP on page 7). I also consider the M-open case as remaining unsolved as finding a convergent assessment about the pseudo-true parameter brings little information about the real parameter and the lack of fit of the superimposed model. Given my great expectations, I ended up being disappointed by the M-free case: there is no optimal choice for the substitute to the loss function that sounds very much like a pseudo-likelihood (or log thereof). (I thought the talk was more conclusive about this, I presumably missed a slide there!) Another great expectation was to read about the proper scaling of the loss function (since L and wL are difficult to separate, except for monetary losses). The authors propose a “correct” scaling based on balancing both faithfulness for a single observation, but this is not a completely tight argument (dependence on parametrisation and prior, notion of a single observation, &tc.)

**T**he illustration section contains two examples, one of which is a full-size or at least challenging genetic data analysis. The loss function is based on a logistic pseudo-likelihood and it provides results where the Bayes factor is in agreement with a likelihood ratio test using Cox’ proportional hazard model. The issue about keeping the baseline function as unkown reminded me of the Robbins-Wasserman paradox Jamie discussed in Varanasi. The second example offers a nice feature of putting uncertainties onto box-plots, although I cannot trust very much the 95% of the credibles sets. (And I do not understand why a unique loss would come to be associated with the median parameter, see p.25.)

*Watch out: Tomorrow’s post contains a reply from the authors!*

## ABC in 1984

Posted in Statistics with tags ABC, Bayesian calibration, empirical Bayes methods, frequency properties, posterior distribution on November 9, 2009 by xi'an

“Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many models over one dataset” D. Rubin,Annals of Statistics, 1984

**J**ean-Louis Foulley sent me a 1984 paper by Don Rubin that details in no uncertain terms the accept-reject algorithm at the core of the ABC algorithm! Namely,

Generate ;

Generate ;

Accept if

**O**bviously, ABC goes further by replacing the acceptance step with the tolerance condition

but this early occurence is worth noticing nonetheless. It is also interesting to see that Don Rubin does not promote this simulation method in situations where the likelihood is not available but rather as an intuitive way to understanding posterior distributions from a frequentist perspective, because ‘s from the posterior are those that could have generated the observed data. (The issue of the zero probability of the exact equality between simulated and observed data is not dealt with in the paper, maybe because the notion of a “match” between simulated and observed data is not clearly defined.) Apart from this historical connection, I recommend the entire paper as providing a very compelling argument for practical Bayesianism!