## Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on May 9, 2017 by xi'an

Guillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as nto a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that all models are wrong: (resonating from last week discussion):

1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

## inference with Wasserstein distance

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on January 23, 2017 by xi'an

Today, Pierre Jacob posted on arXiv a paper of ours on the use of the Wasserstein distance in statistical inference, which main focus is exploiting this distance to create an automated measure of discrepancy for ABC. Which is why the full title is Inference in generative models using the Wasserstein distance. Generative obviously standing for the case when a model can be generated from but cannot be associated with a closed-form likelihood. We had all together discussed this notion when I visited Harvard and Pierre last March, with much excitement. (While I have not contributed much more than that round of discussions and ideas to the paper, the authors kindly included me!) The paper contains theoretical results for the consistency of statistical inference based on those distances, as well as computational on how the computation of these distances is practically feasible and on how the Hilbert space-filling curve used in sequential quasi-Monte Carlo can help. The notion further extends to dependent data via delay reconstruction and residual reconstruction techniques (as we did for some models in our empirical likelihood BCel paper). I am quite enthusiastic about this approach and look forward discussing it at the 17w5015 BIRS ABC workshop, next month!

## running out of explanations

Posted in Books, Kids, Statistics with tags , , , , , on September 23, 2015 by xi'an

A few days ago, I answered a self-study question on Cross Validated about the convergence in probability of 1/X given the convergence in probability of X to a. Until I ran out of explanations… I did not see how to detail any further the connection between both properties! The reader (OP) started from a resolution of the corresponding exercise in Casella and Berger’s Statistical Inference and could not follow the steps, some of which were incorrect. But my attempts at making him uncover the necessary steps failed, presumably because he was sticking to this earlier resolution rather than starting from the definition of convergence in probability. And he could not get over the equality

$\mathbb{P}(|a/X_{i} - 1| < \epsilon)=\mathbb{P}\left(a-{{a\epsilon}\over{1 + \epsilon}} < X_{i} < a + {{a\epsilon}\over{1 - \epsilon}}\right)$

which is the central reason why one convergence transfers to the other… I know I know nothing, and even less about pedagogy, but it is (just so mildly!) frustrating to hit a wall beyond which no further explanation can help! Feel free to propose an alternative resolution.

Update: A few days later, readers of Cross Validated pointed out that the question had been answered by whuber in a magisterial way. But I wonder if my original reader appreciated this resolution, since he did not pursue the issue.

## full Bayesian significance test

Posted in Books, Statistics with tags , , , , , , , , , , on December 18, 2014 by xi'an

Among the many comments (thanks!) I received when posting our Testing via mixture estimation paper came the suggestion to relate this approach to the notion of full Bayesian significance test (FBST) developed by (Julio, not Hal) Stern and Pereira, from São Paulo, Brazil. I thus had a look at this alternative and read the Bayesian Analysis paper they published in 2008, as well as a paper recently published in Logic Journal of IGPL. (I could not find what the IGPL stands for.) The central notion in these papers is the e-value, which provides the posterior probability that the posterior density is larger than the largest posterior density over the null set. This definition bothers me, first because the null set has a measure equal to zero under an absolutely continuous prior (BA, p.82). Hence the posterior density is defined in an arbitrary manner over the null set and the maximum is itself arbitrary. (An issue that invalidates my 1993 version of the Lindley-Jeffreys paradox!) And second because it considers the posterior probability of an event that does not exist a priori, being conditional on the data. This sounds in fact quite similar to Statistical Inference, Murray Aitkin’s (2009) book using a posterior distribution of the likelihood function. With the same drawback of using the data twice. And the other issues discussed in our commentary of the book. (As a side-much-on-the-side remark, the authors incidentally  forgot me when citing our 1992 Annals of Statistics paper about decision theory on accuracy estimators..!)

## Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

## Do we need…yes we do (with some delay)!

Posted in Books, Statistics, University life with tags , , , , , , on April 4, 2013 by xi'an