Archive for statistical modelling

missing bit?

Posted in Books, Statistics, University life with tags , , , , , , , , on January 9, 2021 by xi'an

Nature of 7 December 2020 has a Nature Index (a supplement made of a series of articles, more journalistic than scientific, with corporate backup, which “have no influence over the content”) on Artificial Intelligence, including the above graph representing “the top 200 collaborations among 146 institutions based between 2015 and 2019, sized according to each institution’s share in artificial intelligence”, with only the UK, Germany, Switzerland and Italy identified for Europe… Missing e.g. the output from France and from its major computer science institute, INRIA. Maybe because “the articles picked up by [their] database search concern specific applications of AI in the life sciences, physical sciences, chemistry, and Earth and environmental sciences”.  Or maybe because of the identification of INRIA as such.

“Access to massive data sets on which to train machine-learning systems is one advantage that both the US and China have. Europe, on the other hand, has stringent data laws, which protect people’s privacy, but limit its resources for training AI algorithms. So, it seems unlikely that Europe will produce very sophisticated AI as a consequence”

This comment is sort of contradictory for the attached articles calling for a more ethical AI. Like making AI more transparent and robust. While having unrestricted access to personal is helping with social engineering and control favoured by dictatures and corporate behemoths, a culture of data privacy may (and should) lead to develop new methodology to work with protected data (as in an Alan Turing Institute project) and to infuse more trust from the public. Working with less data does not mean less sophistication in handling it but on the opposite! Another clash of events appears in one of the six trailblazers portrayed in the special supplement being Timnit Gebru, “former co-lead of the Ethical AI Team at Google”, who parted way with Google at the time the issue was published. (See Andrew’s blog for  discussion of her firing. And the MIT Technology Review for an analysis of the paper potentially at the source of it.)

a summer of British conferences!

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on January 18, 2018 by xi'an

model misspecification in ABC

Posted in Statistics with tags , , , , , , , , on August 21, 2017 by xi'an

With David Frazier and Judith Rousseau, we just arXived a paper studying the impact of a misspecified model on the outcome of an ABC run. This is a question that naturally arises when using ABC, but that has been not directly covered in the literature apart from a recently arXived paper by James Ridgway [that was earlier this month commented on the ‘Og]. On the one hand, ABC can be seen as a robust method in that it focus on the aspects of the assumed model that are translated by the [insufficient] summary statistics and their expectation. And nothing else. It is thus tolerant of departures from the hypothetical model that [almost] preserve those moments. On the other hand, ABC involves a degree of non-parametric estimation of the intractable likelihood, which may sound even more robust, except that the likelihood is estimated from pseudo-data simulated from the “wrong” model in case of misspecification.

In the paper, we examine how the pseudo-true value of the parameter [that is, the value of the parameter of the misspecified model that comes closest to the generating model in terms of Kullback-Leibler divergence] is asymptotically reached by some ABC algorithms like the ABC accept/reject approach and not by others like the popular linear regression [post-simulation] adjustment. Which suprisingly concentrates posterior mass on a completely different pseudo-true value. Exploiting our recent assessment of ABC convergence for well-specified models, we show the above convergence result for a tolerance sequence that decreases to the minimum possible distance [between the true expectation and the misspecified expectation] at a slow enough rate. Or that the sequence of acceptance probabilities goes to zero at the proper speed. In the case of the regression correction, the pseudo-true value is shifted by a quantity that does not converge to zero, because of the misspecification in the expectation of the summary statistics. This is not immensely surprising but we hence get a very different picture when compared with the well-specified case, when regression corrections bring improvement to the asymptotic behaviour of the ABC estimators. This discrepancy between two versions of ABC can be exploited to seek misspecification diagnoses, e.g. through the acceptance rate versus the tolerance level, or via a comparison of the ABC approximations to the posterior expectations of quantities of interest which should diverge at rate Vn. In both cases, ABC reference tables/learning bases can be exploited to draw and calibrate a comparison with the well-specified case.

beyond objectivity, subjectivity, and other ‘bjectivities

Posted in Statistics with tags , , , , , , , , , , , , , on April 12, 2017 by xi'an

Here is my discussion of Gelman and Hennig at the Royal Statistical Society, which I am about to deliver!

Statistical rethinking [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on April 6, 2016 by xi'an

Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!

“Make no mistake: you will wreck Prague eventually.” (p.10)

While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book. Continue reading