## inflation, evidence and falsifiability

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on July 27, 2015 by xi'an

[Ewan Cameron pointed this paper to me and blogged about his impressions a few weeks ago. And then Peter Coles wrote a (properly) critical blog entry yesterday. Here are my quick impressions, as an add-on.]

“As the cosmological data continues to improve with its inevitable twists, it has become evident that whatever the observations turn out to be they will be lauded as \proof of inflation”.” G. Gubitosi et al.

In an arXive with the above title, Gubitosi et al. embark upon a generic and critical [and astrostatistical] evaluation of Bayesian evidence and the Bayesian paradigm. Perfect topic and material for another blog post!

“Part of the problem stems from the widespread use of the concept of Bayesian evidence and the Bayes factor (…) The limitations of the existing formalism emerge, however, as soon as we insist on falsifiability as a pre-requisite for a scientific theory (….) the concept is more suited to playing the lottery than to enforcing falsifiability: winning is more important than being predictive.” G. Gubitosi et al.

It is somehow quite hard not to quote most of the paper, because prose such as the above abounds. Now, compared with standards, the authors introduce an higher level than models, called paradigms, as collections of models. (I wonder what is the next level, monads? universes? paradises?) Each paradigm is associated with a marginal likelihood, obtained by integrating over models and model parameters. Which is also the evidence of or for the paradigm. And then, assuming a prior on the paradigms, one can compute the posterior over the paradigms… What is the novelty, then, that “forces” falsifiability upon Bayesian testing (or the reverse)?!

“However, science is not about playing the lottery and winning, but falsifiability instead, that is, about winning given that you have bore the full brunt of potential loss, by taking full chances of not winning a priori. This is not well incorporated into the Bayesian evidence because the framework is designed for other ends, those of model selection rather than paradigm evaluation.” G. Gubitosi et al.

The paper starts by a criticism of the Bayes factor in the point null test of a Gaussian mean, as overly penalising the null against the alternative being only a power law. Not much new there, it is well known that the Bayes factor does not converge at the same speed under the null and under the alternative… The first proposal of those authors is to consider the distribution of the marginal likelihood of the null model under the [or a] prior predictive encompassing both hypotheses or only the alternative [there is a lack of precision at this stage of the paper], in order to calibrate the observed value against the expected. What is the connection with falsifiability? The notion that, under the prior predictive, most of the mass is on very low values of the evidence, leading to concluding against the null. If replacing the null with the alternative marginal likelihood, its mass then becomes concentrated on the largest values of the evidence, which is translated as an unfalsifiable theory. In simpler terms, it means you can never prove a mean θ is different from zero. Not a tremendously item of news, all things considered…

“…we can measure the predictivity of a model (or paradigm) by examining the distribution of the Bayesian evidence assuming uniformly distributed data.”

The alternative is to define a tail probability for the evidence, i.e. the probability to be below an arbitrarily set bound. What remains unclear to me in this notion is the definition of a prior on the data, as it seems to be model dependent, hence prohibits comparison between models since this would involve incompatible priors. The paper goes further into that direction by penalising models according to their predictability, P, as exp{-(1-P²)/P²}. And paradigms as well.

“(…) theoretical matters may end up being far more relevant than any probabilistic issues, of whatever nature. The fact that inflation is not an unavoidable part of any quantum gravity framework may prove to be its greatest undoing.”

Establishing a principled way to weight models would certainly be a major step in the validation of posterior probabilities as a quantitative tool for Bayesian inference, as hinted at in my 1993 paper on the Lindley-Jeffreys paradox, but I do not see such a principle emerging from the paper. Not only because of the arbitrariness in constructing both the predictivity and the associated prior weight, but also because of the impossibility to define a joint predictive, that is a predictive across models, without including the weights of those models. This makes the prior probabilities appearing on “both sides” of the defining equation… (And I will not mention the issues of constructing a prior distribution of a Bayes factor that are related to Aitkin‘s integrated likelihood. And won’t obviously try to enter the cosmological debate about inflation.)

## insufficient statistics for ABC model choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , on October 17, 2014 by xi'an

[Here is a revised version of my comments on the paper by Julien Stoehr, Pierre Pudlo, and Lionel Cucala, now to appear [both paper and comments] in Statistics and Computing special MCMSki 4 issue.]

Approximate Bayesian computation techniques are 2000’s successors of MCMC methods as handling new models where MCMC algorithms are at a loss, in the same way the latter were able in the 1990’s to cover models that regular Monte Carlo approaches could not reach. While they first sounded like “quick-and-dirty” solutions, only to be considered until more elaborate solutions could (not) be found, they have been progressively incorporated within the statistican’s toolbox as a novel form of non-parametric inference handling partly defined models. A statistically relevant feature of those ACB methods is that they require replacing the data with smaller dimension summaries or statistics, because of the complexity of the former. In almost every case when calling ABC is the unique solution, those summaries are not sufficient and the method thus implies a loss of statistical information, at least at a formal level since relying on the raw data is out of question. This forced reduction of statistical information raises many relevant questions, from the choice of summary statistics to the consistency of the ensuing inference.

In this paper of the special MCMSki 4 issue of Statistics and Computing, Stoehr et al. attack the recurrent problem of selecting summary statistics for ABC in a hidden Markov random field, since there is no fixed dimension sufficient statistics in that case. The paper provides a very broad overview of the issues and difficulties related with ABC model choice, which has been the focus of some advanced research only for a few years. Most interestingly, the authors define a novel, local, and somewhat Bayesian misclassification rate, an error that is conditional on the observed value and derived from the ABC reference table. It is the posterior predictive error rate

$\mathbb{P}^{\text{ABC}}(\hat{m}(y^{\text{obs}})\ne m|S(y^{\text{obs}}))$

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather given the transform of the observation by the summary statistic S. The authors even go further to define the error rate of a classification rule based on a first (collection of) statistic, conditional on a second (collection of) statistic (see Definition 1). A notion rather delicate to validate on a fully Bayesian basis. And they advocate the substitution of the unreliable (estimates of the) posterior probabilities by this local error rate, estimated by traditional non-parametric kernel methods. Methods that are calibrated by cross-validation. Given a reference summary statistic, this perspective leads (at least in theory) to select the optimal summary statistic as the one leading to the minimal local error rate. Besides its application to hidden Markov random fields, which is of interest per se, this paper thus opens a new vista on calibrating ABC methods and evaluating their true performances conditional on the actual data. (The advocated abandonment of the posterior probabilities could almost justify the denomination of a paradigm shift. This is also the approach advocated in our random forest paper.)