Archive for robustness

[The Art of] Regression and other stories

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on July 23, 2020 by xi'an

CoI: Andrew sent me this new book [scheduled for 23 July on amazon] of his with Jennifer Hill and Aki Vehtari. Which I read in my garden over a few sunny morns. And as Andrew and Aki are good friends on mine, this review is definitely subjective and biased! Hence to take with a spoonful of salt.

The “other stories’ in the title is a very nice touch. And a clever idea. As the construction of regression models comes as a story to tell, from gathering and checking the data, to choosing the model specifications, to analysing the output and setting the safety lines on its interpretation and usages. I added “The Art of” in my own title as the exercise sounds very much like an art and very little like a technical or even less mathematical practice. Even though the call to the resident stat_glm R function is ubiquitous.

The style itself is very story-like, very far from a mathematical statistics book as, e.g., C.R. Rao’s Linear Statistical Inference and Its Applications. Or his earlier Linear Models which I got while drafted in the Navy. While this makes the “Stories” part most relevant, I also wonder how I could teach from this book to my own undergrad students without acquiring first (myself) the massive expertise represented by the opinions and advice on what is correct and what is not in constructing and analysing linear and generalised linear models. In the sense that I would find justifying or explaining opinionated sentences an amathematical challenge. On the other hand, it would make for a great remote course material, leading the students through the many chapters and letting them experiment with the code provided therein, creating new datasets and checking modelling assumptions. The debate between Bayesian and likelihood solutions is quite muted, with a recommendation for weakly informative priors superseded by the call for exploring the impact of one’s assumption. (Although the horseshoe prior makes an appearance, p.209!) The chapter on math and probability is somewhat superfluous as I hardly fathom a reader entering this book without a certain amount of math and stats background. (While the book warns about over-trusting bootstrap outcomes, I find the description in the Simulation chapter a wee bit too vague.) The final chapters about causal inference are quite impressive in their coverage but clearly require a significant amount of investment from the reader to truly ingest these 110 pages.

“One thing that can be confusing in statistics is that similar analyses can be performed in different ways.” (p.121)

Unsurprisingly, the authors warn the reader about simplistic and unquestioning usages of linear models and software, with a particularly strong warning about significance. (Remember Abandon Statistical Significance?!) And keep (rightly) arguing about the importance of fake data comparisons (although this can be overly confident at times). Great Chapter 11 on assumptions, diagnostics and model evaluation. And terrific Appendix B on 10 pieces of advice for improving one’s regression model. Although there are two or three pages on the topic, at the very end, I would have also appreciated a more balanced and constructive coverage of machine learning as it remains a form of regression, which can be evaluated by simulation of fake data and assessed by X validation, hence quite within the range of the book.

The document reads quite well, even pleasantly once one is over the shock at the limited amount of math formulas!, my only grumble being a terrible handwritten graph for building copters(Figure 1.9) and the numerous and sometimes gigantic square root symbols throughout the book. At a more meaningful level, it may feel as somewhat US centric, at least given the large fraction of examples dedicated to US elections. (Even though restating the precise predictions made by decent models on the eve of the 2016 election is worthwhile.) The Oscar for the best section title goes to “Cockroaches and the zero-inflated negative binomial model” (p.248)! But overall this is a very modern, stats centred, engaging and careful book on the most common tool of statistical modelling! More stories to come maybe?!

the most important statistical ideas of the past 50 years

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , on January 10, 2020 by xi'an

A grand building entrance near the train station in HelsinkiAki and Andrew are celebrating the New Year in advance by composing a list of the most important statistics ideas occurring (roughly) since they were born (or since Fisher died)! Like

  • substitution of computing for mathematical analysis (incl. bootstrap)
  • fitting a model with a large number of parameters, using some regularization procedure to get stable estimates and good predictions (e.g., Gaussian processes, neural networks, generative adversarial networks, variational autoencoders)
  • multilevel or hierarchical modelling (incl. Bayesian inference)
  • advances in statistical algorithms for efficient computing (with a long list of innovations since 1970, including ABC!), pointing out that a large fraction was of the  divide & conquer flavour (in connection with large—if not necessarily Big—data)
  • statistical decision analysis (e.g., Bayesian optimization and reinforcement learning, getting beyond classical experimental design )
  • robustness (under partial specification, misspecification or in the M-open world)
  • EDA à la Tukey and statistical graphics (and R!)
  • causal inference (via counterfactuals)

Now, had I been painfully arm-bent into coming up with such a list, it would have certainly been shorter, for lack of opinion about some of these directions (even the Biometrika deputeditoship has certainly helped in reassessing the popularity of different branches!), and I would have have presumably been biased towards Bayes as well as more mathematical flavours. Hence objecting to the witty comment that “theoretical statistics is the theory of applied statistics”(p.10) and including Ghosal and van der Vaart (2017) as a major reference. Also bemoaning the lack of long-term structure and theoretical support of a branch of the machine-learning literature.

Maybe also more space and analysis could have been spent on “debates remain regarding appropriate use and interpretation of statistical methods” (p.11) in that a major difficulty with the latest in data science is not so much the method(s) as the data on which they are based, which in a large fraction of the cases, is not representative and is poorly if at all corrected for this bias. The “replication crisis” is thus only one (tiny) aspect of the challenge.

O-Bayes15 [day #1]

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , on June 3, 2015 by xi'an

vale3So here we are back together to talk about objective Bayes methods, and in the City of Valencià as well.! A move back to a city where the 1998 O’Bayes took place. In contrast with my introductory tutorial, the morning tutorials by Luis Pericchi and Judith Rousseau were investigating fairly technical and advanced, Judith looking at the tools used in the frequentist (Bernstein-von Mises) analysis of priors, with forays in empirical Bayes, giving insights into a wide range of recent papers in the field. And Luis covering works on Bayesian robustness in the sense of resisting to over-influential observations. Following works of him and of Tony O’Hagan and coauthors. Which means characterising tails of prior versus sampling distribution to allow for the posterior reverting to the prior in case of over-influential datapoints. Funny enough, after a great opening by Carmen and Ed remembering Susie, Chris Holmes also covered Bayesian robust analysis. More in the sense of incompletely or mis-  specified models. (On the side, rekindling one comment by Susie and the need to embed robust Bayesian analysis within decision theory.) Which was also much Chris’ point, in line with the recent Watson and Holmes’ paper. Dan Simpson in his usual kick-the-anthill-real-hard-and-set-fire-to-it discussion pointed out the possible discrepancy between objective and robust Bayesian analysis. (With lines like “modern statistics has proven disruptive to objective Bayes”.) Which is not that obvious because the robust approach simply reincorporates the decision theory within the objective framework. (Dan also concluded with the comic strip below, whose message can be interpreted in many ways…! Or not.)

The second talk of the afternoon was given by Veronika Ročková on a novel type of spike-and-slab prior to handle sparse regression, bringing an alternative to the standard Lasso. The prior is a mixture of two Laplace priors whose scales are constrained in connection with the actual number of non-zero coefficients. I had not heard of this approach before (although Veronika and Ed have an earlier paper on a spike-and-slab prior to handle multicolinearity that Veronika presented in Boston last year) and I was quite impressed by the combination of minimax properties and practical determination of the scales. As well as by the performances of this spike-and-slab Lasso. I am looking forward the incoming paper!

The day ended most nicely in the botanical gardens of the University of Valencià, with an outdoor reception surrounded by palm trees and parakeet cries…

robust Bayesian FDR control with Bayes factors

Posted in Statistics, University life with tags , , , , on December 20, 2013 by xi'an

Here are a few comments on a recently arXived paper on FDRs by Xioaquan Wen (who asked for them!). Although there is less frenzy about false discovery rates in multiple testing now than in the 1990s, and I have not done anything on it since our 2004 JASA paper, this is still a topic of interest to me. Although maybe not in the formalised way the model is constructed here.

“Although the Bayesian FDR control is conceptually straightforward, its practical performance is susceptible to alternative model misspecifications. In comparison, the p-value based frequentist FDR control procedures demand only adequate behavior of p-values under the null models and generally ensure targeted FDR control levels, regardless of the distribution of p-values under the assumed alternatives.”

Now, I find the above quote of interest as it relates to Val Johnson’s argument for his uniformly most powerful “Bayesian” tests (now sufficiently discussed on the ‘Og!). This is a rather traditional criticism of using Bayes factors that they depend on the prior modelling, to the point it made it to the introduction of my tutorial yesterday. Actually, the paper has similar arguments to Johnson’s (who is quoted in the paper for earlier works) in that the criteria for validating a point estimator of the proportion of positives is highly frequentist. And does not care much about the alternative hypothesis. Besides, the modelling used therein is puzzling in that there seems to be a single parameter in the model, namely the true proportion of positives, which sounds to me as an hyper-stylised representation of real experiments. To the point of being useless… (Even if there are extra-parameters, they differ for each observation.) In addition, the argument leading to the proposed procedure is unclear: if the Bayes factors are to be consistent under the null and the proportion of positives needs an asymptotically guaranteed upper bound, the choice of a estimate equal to 1 does the job. (This is noticed on page 9.) So the presentation seems to miss a counter-factor to avoid this trivial solution.

“On the other hand, the Bayes factors from the true alternative models with reasonable powers should be, on average, greater than 1 (i.e., favoring the alternative over the null models). Therefore, the sample mean of the observed Bayes factors carries information regarding the mixture percentage.”

The estimator of this true proportion ends up being the proportion of Bayes factors less than 1, an anti-climactic proposal as it means accepting the null each time the Bayes factor is less than 1. (I did not check the proof that it overestimates the true proportion. ) Or the one of Storey (2003). However, the above quote shows it is validated only when the (true) alternative connects with the Bayes factor. So I do not see how this agrees with the robustness property of behaving well “under misspecifications of parametric alternative models”. Furthermore, in the specific framework adopted by the paper, the “misspecifications” are difficult to fathom, as they would mean that the parameter-free distributions of the observations under the alternatives are wrong and thus may render the Bayes factors to be arbitrary. Hence jeopardising the validity of the selection process. So there is something missing in the picture, I fear.

Thus, while the second half of the paper is dedicated to an extensive simulation study, what I found the most interesting direction in the paper is the question of the distribution of the Bayes factors (under the null or not), albeit not a Bayesian question, as it relates to the use and the calibration of ABC model choice (and the proposal by Fearnhead and Prangle of using the Bayes factor as the summary statistics). The fact that the (marginal) expectation of the Bayes factor under the null (marginal) is noteworthy but not as compelling as the author argues, because (a) it is only an expectation and (b) it tells nothing about the alternative. The distribution of the Bayes factor does depend upon the alternative through the Bayes factor, so mileage [of the quantile Bayes factor] may vary (as shown by the assumption “for Bayes factors with reasonable power”, p.14). Drawing Bayesian inference based on Bayes factors only is nonetheless an area worth investigating!

Bayesian brittleness, again

Posted in Books, Statistics with tags , , , on September 11, 2013 by xi'an

“With the advent of high-performance computing, Bayesian methods are increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is becoming a pressing question.”

A second paper by Owhadi, Scovel and Sullivan on Bayesian brittleness has just been arXived. This one has the dramatic title `When Bayesian inference shatters‘..! If you remember (or simply check) my earlier post, the topic of this work is the robustness of Bayesian inference under model mispsecification, robustness which is completely lacking from the authors’ perspective. This paper is much shorter than the earlier one (and sounds like a commentary on it), but it concludes in a similar manner, namely that Bayesian inference suffers from `maximal brittleness under local mis-speci cation’ (p.6), which means that `the range of posterior predictions among all admissible priors is as wide as the deterministic range of the quantity of interest’ when the true model is not within the range of the parametric models covered by the prior distribution.  The novelty in the paper appears to be in the extension that, even when we consider only the k first moments of the unknown distribution, Bayesian inference is not robust (this is called the Brittleness Theorem, p.9). As stated earlier, while I appreciate this sort of theoretical derivation, I am somehow dubious as to whether or not this impacts the practice of Bayesian statistics to the amount mentioned in the above quote. In particular, I do not see how those results cast more doubts on the impact of the prior modelling on the posterior outcome. While we all (?) agree on the fact that “any given prior and model can be slightly perturbed to achieve any desired posterior conclusion”, the repeatability or falsifiability of the Bayesian experiment (change your prior and run the experiment afresh) allows for an assessment of the posterior outcome that prevents under-the-carpet effects.