## Archive for statistical inference

## absurd prices on Amazon

Posted in Statistics with tags Amazon, George Casella, pricing algorithm, statistical inference on November 30, 2019 by xi'an## severe testing or severe sabotage? [not a book review]

Posted in Books, pictures, Statistics, University life with tags Cambridge University Press, commercial editor, cup, Deborah Mayo, philosophy of sciences, print on demand, severe testing, statistical inference, statistics wars, testing of hypotheses on October 16, 2018 by xi'an**L**ast week, I received this new book of Deborah Mayo, which I was looking forward reading and annotating!, but thrice alas, the book had been sabotaged: except for the preface and acknowledgements, the entire book is printed upside down [a minor issue since the entire book is concerned] and with some part of the text cut on each side [a few letters each time but enough to make reading a chore!]. I am thus waiting for a tested copy of the book to start reading it in earnest!

## Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags all models are wrong, Bayesian foundations, cross validated, Cymru, Gregynog, misspecified model, Powys, Spring, statistical inference, Wales on May 9, 2017 by xi'an**G**uillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as n→∞ to a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that *all models are wrong*: (resonating from last week discussion):

- While the hypothetical model is indeed almost invariably and irremediably
*wrong*, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any); - There exist Bayesian approaches that can do
*without the model*, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau); - In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
- And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
- The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
- In a sense, Bayesians should be
*the least concerned*among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

## inference with Wasserstein distance

Posted in Books, Statistics, University life with tags 17w5025, adaptive Monte Carlo algorithm, Banff, BIRS, Canada, empirical distribution, Harvard University, numerical transport, optimal transport, statistical inference, synthetic data, Wasserstein distance on January 23, 2017 by xi'an**T**oday, Pierre Jacob posted on arXiv a paper of ours on the use of the Wasserstein distance in statistical inference, which main focus is exploiting this distance to create an automated measure of discrepancy for ABC. Which is why the full title is Inference in generative models using the Wasserstein distance. Generative obviously standing for the case when a model can be generated from but cannot be associated with a closed-form likelihood. We had all together discussed this notion when I visited Harvard and Pierre last March, with much excitement. (While I have not contributed much more than that round of discussions and ideas to the paper, the authors kindly included me!) The paper contains theoretical results for the consistency of statistical inference based on those distances, as well as computational on how the computation of these distances is practically feasible and on how the Hilbert space-filling curve used in sequential quasi-Monte Carlo can help. The notion further extends to dependent data via delay reconstruction and residual reconstruction techniques (as we did for some models in our empirical likelihood BCel paper). I am quite enthusiastic about this approach and look forward discussing it at the 17w5015 BIRS ABC workshop, next month!

## running out of explanations

Posted in Books, Kids, Statistics with tags convergence in probability, cross validated, George Casella, pedagogy, proof, statistical inference on September 23, 2015 by xi'an**A** few days ago, I answered a self-study question on Cross Validated about the convergence in probability of 1/X given the convergence in probability of X to a. Until I ran out of explanations… I did not see how to detail any further the connection between both properties! The reader (OP) started from a resolution of the corresponding exercise in Casella and Berger’s Statistical Inference and could not follow the steps, some of which were incorrect. But my attempts at making him uncover the necessary steps failed, presumably because he was sticking to this earlier resolution rather than starting from the definition of convergence in probability. And he could not get over the equality

which is the central reason why one convergence transfers to the other… I know I know nothing, and even less about pedagogy, but it is (just so mildly!) frustrating to hit a wall beyond which no further explanation can help! Feel free to propose an alternative resolution.

**Update:** A few days later, readers of Cross Validated pointed out that the question had been answered by whuber in a magisterial way. But I wonder if my original reader appreciated this resolution, since he did not pursue the issue.

## full Bayesian significance test

Posted in Books, Statistics with tags Bayes factor, Bayesian Analysis, Bayesian model choice, e-values, full Bayesian significance test, logic journal of the IGPL, measure theory, Murray Aitkin, p-values, São Paulo, statistical inference on December 18, 2014 by xi'an**A**mong the many comments (thanks!) I received when posting our Testing via mixture estimation paper came the suggestion to relate this approach to the notion of full Bayesian significance test (FBST) developed by (Julio, not Hal) Stern and Pereira, from São Paulo, Brazil. I thus had a look at this alternative and read the Bayesian Analysis paper they published in 2008, as well as a paper recently published in Logic Journal of IGPL. (I could not find what the IGPL stands for.) The central notion in these papers is the *e-value*, which provides the *posterior probability that the posterior density is larger than the largest posterior density over the null set*. This definition bothers me, first because the *null* set has a measure equal to zero under an absolutely continuous prior (BA, p.82). Hence the posterior density is defined in an arbitrary manner over the *null* set and the maximum is itself arbitrary. (An issue that invalidates my 1993 version of the Lindley-Jeffreys paradox!) And second because it considers the posterior probability of an event that does not exist a priori, being conditional on the data. This sounds in fact quite similar to *Statistical Inference*, Murray Aitkin’s (2009) book using a posterior distribution of the likelihood function. With the same drawback of using the data twice. And the other issues discussed in our commentary of the book. (As a side-much-on-the-side remark, the authors incidentally forgot me when citing our 1992 Annals of Statistics paper about decision theory on accuracy estimators..!)