## Archive for statistical modelling

## a summer of British conferences!

Posted in pictures, Statistics, Travel, University life with tags BAYSM 2018, Britain, conference, Edinburgh, England, ISBA 2018, iwsm2018, statistical modelling, University of Bristol, Warwick on January 18, 2018 by xi'an## model misspecification in ABC

Posted in Statistics with tags ABC, all models are wrong, Australia, likelihood-free methods, Melbourne, Mission Beach, model mispecification, Monash University, statistical modelling on August 21, 2017 by xi'an**W**ith David Frazier and Judith Rousseau, we just arXived a paper studying the impact of a misspecified model on the outcome of an ABC run. This is a question that naturally arises when using ABC, but that has been not directly covered in the literature apart from a recently arXived paper by James Ridgway [that was earlier this month commented on the ‘Og]. On the one hand, ABC can be seen as a robust method in that it focus on the aspects of the assumed model that are translated by the [insufficient] summary statistics and their expectation. And nothing else. It is thus tolerant of departures from the hypothetical model that [almost] preserve those moments. On the other hand, ABC involves a degree of non-parametric estimation of the intractable likelihood, which may sound even more robust, except that the likelihood is estimated from pseudo-data simulated from the “wrong” model in case of misspecification.

In the paper, we examine how the pseudo-true value of the parameter [that is, the value of the parameter of the misspecified model that comes closest to the generating model in terms of Kullback-Leibler divergence] is asymptotically reached by some ABC algorithms like the ABC accept/reject approach and not by others like the popular linear regression [post-simulation] adjustment. Which suprisingly concentrates posterior mass on a completely different pseudo-true value. Exploiting our recent assessment of ABC convergence for well-specified models, we show the above convergence result for a tolerance sequence that decreases to the minimum possible distance [between the true expectation and the misspecified expectation] at a slow enough rate. Or that the sequence of acceptance probabilities goes to zero at the proper speed. In the case of the regression correction, the pseudo-true value is shifted by a quantity that does not converge to zero, because of the misspecification in the expectation of the summary statistics. This is not immensely surprising but we hence get a very different picture when compared with the well-specified case, when regression corrections bring improvement to the asymptotic behaviour of the ABC estimators. This discrepancy between two versions of ABC can be exploited to seek misspecification diagnoses, e.g. through the acceptance rate versus the tolerance level, or via a comparison of the ABC approximations to the posterior expectations of quantities of interest which should diverge at rate Vn. In both cases, ABC reference tables/learning bases can be exploited to draw and calibrate a comparison with the well-specified case.

## beyond objectivity, subjectivity, and other ‘bjectivities

Posted in Statistics with tags Andrew Gelman, Christian Hennig, discussion paper, Errol Street, frequentist inference, London, objectivism, Read paper, Royal Statistical Society, RSS, Series A, statistical modelling, subjective versus objective Bayes, subjectivity on April 12, 2017 by xi'an**H**ere is my discussion of Gelman and Hennig at the Royal Statistical Society, which I am about to deliver!

## Statistical rethinking [book review]

Posted in Books, Kids, R, Statistics, University life with tags Amazon, Bayes theorem, Bayesian data analysis, Bayesian Essentials with R, book review, CHANCE, code, convergence diagnostics, E.T. Jaynes, generalised linear models, golem, maths, matrix algebra, MCMC algorithms, mixtures of distributions, Monte Carlo Statistical Methods, Prague, R, robots, STAN, statistical modelling, Statistical rethinking on April 6, 2016 by xi'anStatistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!

“Make no mistake: you will wreck Prague eventually.” (p.10)

While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book. Continue reading

## interesting mis-quote

Posted in Books, pictures, Statistics, Travel, University life with tags Alan Turing, all models are wrong, artificial intelligence, George Box, misquote, Peter Norvig, statistical modelling, The End of Theory, Thomas Bayes on September 25, 2014 by xi'an**A**t a recent conference on Big Data, one speaker mentioned this quote from Peter Norvig, the director of research at Google:

“All models are wrong, and increasingly you can succeed without them.”

quote that I found rather shocking, esp. when considering the amount of modelling behind Google tools. And coming from someone citing Kernel Methods for Pattern Analysis by Shawe-Taylor and Christianini as one of his favourite books and Bayesian Data Analysis as another one… Or displaying Bayes [or his alleged portrait] and Turing in his book cover. So I went searching on the Web for more information about this surprising quote. And found the explanation, as given by Peter Norvig himself:

“To set the record straight: That’s a silly statement, I didn’t say it, and I disagree with it.”

Which means that weird quotes have a high probability of being misquotes. And used by others to (obviously) support their own agenda. In the current case, Chris Anderson and his End of Theory paradigm. Briefly and mildly discussed by Andrew a few years ago.

## Statistics second slides

Posted in Books, Kids, Statistics, University life with tags blood pressure, exponential families, logistic regression, statistical modelling, undergraduates, Université Paris Dauphine on September 24, 2014 by xi'an**T**his is the next chapter of my Statistics course, definitely more standard, with some notions on statistical models, limit theorems, and exponential families. In the first class, I recalled the convergence notions with no proof but counterexamples and spend some time on a slide not included here, borrowed from Chris Holmes’ talk last Friday on the linear relation between blood pressure and the log odds ratio of an heart condition. This was a great example, both to illustrate the power of increasing the number of observations and of using a logistic regression model. Students kept asking questions about it.

## 10w2170, Banff

Posted in Books, Mountains, R, Statistics with tags Banff, BIRS, Ecology, forestry, Gran Paradiso, hierarchical Bayesian modelling, Oberwolfach, statistical modelling, University of Alberta on September 11, 2010 by xi'an**Y**esterday night, we started the **Hierarchical Bayesian Methods in Ecology** workshop by trading stories. Everyone involved in the programme discussed his/her favourite dataset and corresponding expectations from the course. I found the exchange most interesting, like the one we had two years ago in Gran Paradiso, because of the diversity of approaches to Statistics reflected by the exposition. However, a constant theme is the desire to compare and rank models (this term having different meanings for different students) and the understanding that hierarchical models are a superior way to handle heterogeneity and to gather strength from the whole dataset. A two-day workshop is certainly too short to meet students’ expectations and I hope I will manage to focus on the concepts rather than on the maths and computations…

**A**s each time I come here, the efficiency of BIRS in handling the workshop and making everything smooth and running amazes me. Except for the library, I think it really compares with Oberwolfach in terms of environment and working facilities. (Oberwolfach offers the appeal of seclusion and the Black Forest, while BIRS is providing summits all around plus the range of facility of the Banff Centre and the occasional excitement of a bear crossing the campus or a cougar killing a deer on its outskirt…)