Archive for statistical inference
absurd prices on Amazon
Posted in Statistics with tags Amazon, George Casella, pricing algorithm, statistical inference on November 30, 2019 by xi'ansevere testing or severe sabotage? [not a book review]
Posted in Books, pictures, Statistics, University life with tags Cambridge University Press, commercial editor, cup, Deborah Mayo, philosophy of sciences, print on demand, severe testing, statistical inference, statistics wars, testing of hypotheses on October 16, 2018 by xi'anLast week, I received this new book of Deborah Mayo, which I was looking forward reading and annotating!, but thrice alas, the book had been sabotaged: except for the preface and acknowledgements, the entire book is printed upside down [a minor issue since the entire book is concerned] and with some part of the text cut on each side [a few letters each time but enough to make reading a chore!]. I am thus waiting for a tested copy of the book to start reading it in earnest!
Why should I be Bayesian when my model is wrong?
Posted in Books, pictures, Running, Statistics, Travel, University life with tags all models are wrong, Bayesian foundations, cross validated, Cymru, Gregynog, misspecified model, Powys, Spring, statistical inference, Wales on May 9, 2017 by xi'anGuillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:
However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.
My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.
Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as n→∞ to a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.
Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)
A few thoughts about that meme that all models are wrong: (resonating from last week discussion):
- While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
- There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
- In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
- And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
- The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
- In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.
inference with Wasserstein distance
Posted in Books, Statistics, University life with tags 17w5025, adaptive Monte Carlo algorithm, Banff, BIRS, Canada, empirical distribution, Harvard University, numerical transport, optimal transport, statistical inference, synthetic data, Wasserstein distance on January 23, 2017 by xi'anToday, Pierre Jacob posted on arXiv a paper of ours on the use of the Wasserstein distance in statistical inference, which main focus is exploiting this distance to create an automated measure of discrepancy for ABC. Which is why the full title is Inference in generative models using the Wasserstein distance. Generative obviously standing for the case when a model can be generated from but cannot be associated with a closed-form likelihood. We had all together discussed this notion when I visited Harvard and Pierre last March, with much excitement. (While I have not contributed much more than that round of discussions and ideas to the paper, the authors kindly included me!) The paper contains theoretical results for the consistency of statistical inference based on those distances, as well as computational on how the computation of these distances is practically feasible and on how the Hilbert space-filling curve used in sequential quasi-Monte Carlo can help. The notion further extends to dependent data via delay reconstruction and residual reconstruction techniques (as we did for some models in our empirical likelihood BCel paper). I am quite enthusiastic about this approach and look forward discussing it at the 17w5015 BIRS ABC workshop, next month!