## not Bayesian enough?!

Posted in Books, Statistics, University life with tags , , , , , , , on January 23, 2015 by xi'an

Our random forest paper was alas rejected last week. Alas because I think the approach is a significant advance in ABC methodology when implemented for model choice, avoiding the delicate selection of summary statistics and the report of shaky posterior probability approximation. Alas also because the referees somewhat missed the point, apparently perceiving random forests as a way to project a large collection of summary statistics on a limited dimensional vector as in the Read Paper of Paul Fearnhead and Dennis Prarngle, while the central point in using random forests is the avoidance of a selection or projection of summary statistics.  They also dismissed ou approach based on the argument that the reduction in error rate brought by random forests over LDA or standard (k-nn) ABC is “marginal”, which indicates a degree of misunderstanding of what the classification error stand for in machine learning: the maximum possible gain in supervised learning with a large number of classes cannot be brought arbitrarily close to zero. Last but not least, the referees did not appreciate why we mostly cannot trust posterior probabilities produced by ABC model choice and hence why the posterior error loss is a valuable and almost inevitable machine learning alternative, dismissing the posterior expected loss as being not Bayesian enough (or at all), for “averaging over hypothetical datasets” (which is a replicate of Jeffreys‘ famous criticism of p-values)! Certainly a first time for me to be rejected based on this argument!

## ABC with emulators

Posted in Books, Statistics with tags , , , , , , , on January 9, 2015 by xi'an

A paper on the comparison of emulation methods for Approximate Bayesian Computation was recently arXived by Jabot et al. The idea is to bypass costly simulations of pseudo-data by running cheaper simulation from a pseudo-model or emulator constructed via a preliminary run of the original and costly model. To borrow from the paper introduction, ABC-Emulation runs as follows:

1. design a small number n of parameter values covering the parameter space;
2. generate n corresponding realisations from the model and store the corresponding summary statistics;
3. build an emulator (model) based on those n values;
4. run ABC using the emulator in lieu of the original model.

A first emulator proposed in the paper is to use local regression, as in Beaumont et al. (2002), except that it goes the reverse way: the regression model predicts a summary statistics given the parameter value. The second and last emulator relies on Gaussian processes, as in Richard Wilkinson‘s as well as Ted Meeds’s and Max Welling‘s recent work [also quoted in the paper]. The comparison of the above emulators is based on an ecological community dynamics model. The results are that the stochastic version is superior to the deterministic one, but overall not very useful when implementing the Beaumont et al. (2002) correction. The paper however does not define what deterministic and what stochastic mean…

“We therefore recommend the use of local regressions instead of Gaussian processes.”

While I find the conclusions of the paper somewhat over-optimistic given the range of the experiment and the limitations of the emulator options (like non-parametric conditional density estimation), it seems to me that this is a direction to be pursued as we need to be able to simulate directly a vector of summary statistics instead of the entire data process, even when considering an approximation to the distribution of those summaries.

## an ABC experiment

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , on November 24, 2014 by xi'an

In a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm:

#normal data with 100 observations
n=100
x=rnorm(n)
#observed summaries

#normal x gamma prior
priori=function(N){
return(cbind(rnorm(N,sd=10),
1/sqrt(rgamma(N,shape=2,scale=5))))
}

ABC=function(N,alpha=.05){

prior=priori(N) #reference table

#pseudo-data
summ=matrix(0,N,2)
for (i in 1:N){
xi=rnorm(n)*prior[i,2]+prior[i,1]
}

#normalisation factor for the distance
#distance
#selection
posterior=prior[dist<quantile(dist,alpha),]}


Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than when using the mean and the variance as summary statistics. This is not completely surprising in that the latter are sufficient, while the former are not. Still, the (-10,10) range on the mean is way larger… The second reason for surprise is that the true posterior distribution cannot be derived since the joint density of med and mad is unavailable.

After thinking about this for a while, I went back to my workbench to check the difference with using mean and variance. To my greater surprise, I found hardly any difference! Using the almost exact ABC with 10⁶ simulations and a 5% subsampling rate returns exactly the same outcome. (The first row above is for the sufficient statistics (mean,standard deviation) while the second row is for the (median,mad) pair.) Playing with the distance does not help. The genuine posterior output is quite different, as exposed on the last row of the above, using a basic Gibbs sampler since the posterior is not truly conjugate.

## Sequentially Constrained Monte Carlo

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , , , , , , on November 7, 2014 by xi'an

This newly arXived paper by S. Golchi and D. Campbell from Vancouver (hence the above picture) considers the (quite) interesting problem of simulating from a target distribution defined by a constraint. This is a question that have bothered me for a long while as I could not come up with a satisfactory solution all those years… Namely, when considering a hard constraint on a density, how can we find a sequence of targets that end up with the restricted density? This is of course connected with the zero measure case posted a few months ago. For instance, how do we efficiently simulate a sample from a Student’s t distribution with a fixed sample mean and a fixed sample variance?

“The key component of SMC is the filtering sequence of distributions through which the particles evolve towards the target distribution.” (p.3)

This is indeed the main issue! The paper considers using a sequence of intermediate targets hardening progressively the constraint(s), along with an SMC sampler, but this recommendation remains rather vague and hence I am at loss as to how to make it work when the exact constraint implies a change of measure. The first example is monotone regression where y has mean f(x) and f is monotone. (Everything is unidimensional here.) The sequence is then defined by adding a multiplicative term that is a function of ∂f/∂x, for instance

Φ(τ∂f/∂x),

with τ growing to infinity to make the constraint moving from soft to hard. An interesting introduction, even though the hard constraint does not imply a change of parameter space or of measure. The second example is about estimating the parameters of an ODE, with the constraint being the ODE being satisfied exactly. Again, not exactly what I was looking for. But with an exotic application to deaths from the 1666 Black (Death) plague.

And then the third example is about ABC and the choice of summary statistics! The sequence of constraints is designed to keep observed and simulated summary statistics close enough when the dimension of those summaries increases, which means they are considered simultaneously rather than jointly. (In the sense of Ratmann et al., 2009. That is, with a multidimensional distance.) The model used for the application of the SMC is the dynamic model of Wood (2010, Nature). The outcome of this specific implementation is not that clear compared with alternatives… And again sadly does not deal with the/my zero measure issue.

## Relevant statistics for Bayesian model choice [hot off the press!]

Posted in Books, Statistics, University life with tags , , , , , , on October 30, 2014 by xi'an

Our paper about evaluating statistics used for ABC model choice has just appeared in Series B! It somewhat paradoxical that it comes out just a few days after we submitted our paper on using random forests for Bayesian model choice, thus bypassing the need for selecting those summary statistics by incorporating all statistics available and letting the trees automatically rank those statistics in term of their discriminating power. Nonetheless, this paper remains an exciting piece of work (!) as it addresses the more general and pressing question of the validity of running a Bayesian analysis with only part of the information contained in the data. Quite usefull in my (biased) opinion when considering the emergence of approximate inference already discussed on this ‘Og…

[As a trivial aside, I had first used fresh from the press(es) as the bracketted comment, before I realised the meaning was not necessarily the same in English and in French.]

## reliable ABC model choice via random forests

Posted in pictures, R, Statistics, University life with tags , , , , , , , on October 29, 2014 by xi'an

After a somewhat prolonged labour (!), we have at last completed our paper on ABC model choice with random forests and submitted it to PNAS for possible publication. While the paper is entirely methodological, the primary domain of application of ABC model choice methods remains population genetics and the diffusion of this new methodology to the users is thus more likely via a media like PNAS than via a machine learning or statistics journal.

When compared with our recent update of the arXived paper, there is not much different in contents, as it is mostly an issue of fitting the PNAS publication canons. (Which makes the paper less readable in the posted version [in my opinion!] as it needs to fit the main document within the compulsory six pages, relegated part of the experiments and of the explanations to the Supplementary Information section.)

## insufficient statistics for ABC model choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on October 17, 2014 by xi'an

[Here is a revised version of my comments on the paper by Julien Stoehr, Pierre Pudlo, and Lionel Cucala, now to appear [both paper and comments] in Statistics and Computing special MCMSki 4 issue.]

Approximate Bayesian computation techniques are 2000’s successors of MCMC methods as handling new models where MCMC algorithms are at a loss, in the same way the latter were able in the 1990’s to cover models that regular Monte Carlo approaches could not reach. While they first sounded like “quick-and-dirty” solutions, only to be considered until more elaborate solutions could (not) be found, they have been progressively incorporated within the statistican’s toolbox as a novel form of non-parametric inference handling partly defined models. A statistically relevant feature of those ACB methods is that they require replacing the data with smaller dimension summaries or statistics, because of the complexity of the former. In almost every case when calling ABC is the unique solution, those summaries are not sufficient and the method thus implies a loss of statistical information, at least at a formal level since relying on the raw data is out of question. This forced reduction of statistical information raises many relevant questions, from the choice of summary statistics to the consistency of the ensuing inference.

In this paper of the special MCMSki 4 issue of Statistics and Computing, Stoehr et al. attack the recurrent problem of selecting summary statistics for ABC in a hidden Markov random field, since there is no fixed dimension sufficient statistics in that case. The paper provides a very broad overview of the issues and difficulties related with ABC model choice, which has been the focus of some advanced research only for a few years. Most interestingly, the authors define a novel, local, and somewhat Bayesian misclassification rate, an error that is conditional on the observed value and derived from the ABC reference table. It is the posterior predictive error rate

$\mathbb{P}^{\text{ABC}}(\hat{m}(y^{\text{obs}})\ne m|S(y^{\text{obs}}))$

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather given the transform of the observation by the summary statistic S. The authors even go further to define the error rate of a classification rule based on a first (collection of) statistic, conditional on a second (collection of) statistic (see Definition 1). A notion rather delicate to validate on a fully Bayesian basis. And they advocate the substitution of the unreliable (estimates of the) posterior probabilities by this local error rate, estimated by traditional non-parametric kernel methods. Methods that are calibrated by cross-validation. Given a reference summary statistic, this perspective leads (at least in theory) to select the optimal summary statistic as the one leading to the minimal local error rate. Besides its application to hidden Markov random fields, which is of interest per se, this paper thus opens a new vista on calibrating ABC methods and evaluating their true performances conditional on the actual data. (The advocated abandonment of the posterior probabilities could almost justify the denomination of a paradigm shift. This is also the approach advocated in our random forest paper.)