Archive for Karl Popper

beyond subjective and objective in Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on August 28, 2015 by xi'an

“At the level of discourse, we would like to move beyond a subjective vs. objective shouting match.” (p.30)

This paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms objective and subjective in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move. Continue reading

inflation, evidence and falsifiability

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on July 27, 2015 by xi'an

[Ewan Cameron pointed this paper to me and blogged about his impressions a few weeks ago. And then Peter Coles wrote a (properly) critical blog entry yesterday. Here are my quick impressions, as an add-on.]

“As the cosmological data continues to improve with its inevitable twists, it has become evident that whatever the observations turn out to be they will be lauded as \proof of inflation”.” G. Gubitosi et al.

In an arXive with the above title, Gubitosi et al. embark upon a generic and critical [and astrostatistical] evaluation of Bayesian evidence and the Bayesian paradigm. Perfect topic and material for another blog post!

“Part of the problem stems from the widespread use of the concept of Bayesian evidence and the Bayes factor (…) The limitations of the existing formalism emerge, however, as soon as we insist on falsifiability as a pre-requisite for a scientific theory (….) the concept is more suited to playing the lottery than to enforcing falsifiability: winning is more important than being predictive.” G. Gubitosi et al.

It is somehow quite hard not to quote most of the paper, because prose such as the above abounds. Now, compared with standards, the authors introduce an higher level than models, called paradigms, as collections of models. (I wonder what is the next level, monads? universes? paradises?) Each paradigm is associated with a marginal likelihood, obtained by integrating over models and model parameters. Which is also the evidence of or for the paradigm. And then, assuming a prior on the paradigms, one can compute the posterior over the paradigms… What is the novelty, then, that “forces” falsifiability upon Bayesian testing (or the reverse)?!

“However, science is not about playing the lottery and winning, but falsifiability instead, that is, about winning given that you have bore the full brunt of potential loss, by taking full chances of not winning a priori. This is not well incorporated into the Bayesian evidence because the framework is designed for other ends, those of model selection rather than paradigm evaluation.” G. Gubitosi et al.

The paper starts by a criticism of the Bayes factor in the point null test of a Gaussian mean, as overly penalising the null against the alternative being only a power law. Not much new there, it is well known that the Bayes factor does not converge at the same speed under the null and under the alternative… The first proposal of those authors is to consider the distribution of the marginal likelihood of the null model under the [or a] prior predictive encompassing both hypotheses or only the alternative [there is a lack of precision at this stage of the paper], in order to calibrate the observed value against the expected. What is the connection with falsifiability? The notion that, under the prior predictive, most of the mass is on very low values of the evidence, leading to concluding against the null. If replacing the null with the alternative marginal likelihood, its mass then becomes concentrated on the largest values of the evidence, which is translated as an unfalsifiable theory. In simpler terms, it means you can never prove a mean θ is different from zero. Not a tremendously item of news, all things considered…

“…we can measure the predictivity of a model (or paradigm) by examining the distribution of the Bayesian evidence assuming uniformly distributed data.” G. Gubitosi et al.

The alternative is to define a tail probability for the evidence, i.e. the probability to be below an arbitrarily set bound. What remains unclear to me in this notion is the definition of a prior on the data, as it seems to be model dependent, hence prohibits comparison between models since this would involve incompatible priors. The paper goes further into that direction by penalising models according to their predictability, P, as exp{-(1-P²)/P²}. And paradigms as well.

“(…) theoretical matters may end up being far more relevant than any probabilistic issues, of whatever nature. The fact that inflation is not an unavoidable part of any quantum gravity framework may prove to be its greatest undoing.” G. Gubitosi et al.

Establishing a principled way to weight models would certainly be a major step in the validation of posterior probabilities as a quantitative tool for Bayesian inference, as hinted at in my 1993 paper on the Lindley-Jeffreys paradox, but I do not see such a principle emerging from the paper. Not only because of the arbitrariness in constructing both the predictivity and the associated prior weight, but also because of the impossibility to define a joint predictive, that is a predictive across models, without including the weights of those models. This makes the prior probabilities appearing on “both sides” of the defining equation… (And I will not mention the issues of constructing a prior distribution of a Bayes factor that are related to Aitkin‘s integrated likelihood. And won’t obviously try to enter the cosmological debate about inflation.)

can we trust computer simulations?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on July 10, 2015 by xi'an

lion

How can one validate the outcome of a validation model? Or can we even imagine validation of this outcome? This was the starting question for the conference I attended in Hannover. Which obviously engaged me to the utmost. Relating to some past experiences like advising a student working on accelerated tests for fighter electronics. And failing to agree with him on validating a model to turn those accelerated tests within a realistic setting. Or reviewing this book on climate simulation three years ago while visiting Monash University. Since I discuss in details below most talks of the day, here is an opportunity to opt away! Continue reading

eliminating an important obstacle to creative thinking: statistics…

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on March 12, 2015 by xi'an

“We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.”

About a month ago, David Trafimow and Michael Marks, the current editors of the journal Basic and Applied Social Psychology published an editorial banning all null hypothesis significance testing procedures (acronym-ed into the ugly NHSTP which sounds like a particularly nasty venereal disease!) from papers published by the journal. My first reaction was “Great! This will bring more substance to the papers by preventing significance fishing and undisclosed multiple testing! Power to the statisticians!” However, after reading the said editorial, I realised it was inspired by a nihilistic anti-statistical stance, backed by an apparent lack of understanding of the nature of statistical inference, rather than a call for saner and safer statistical practice. The editors most clearly state that inferential statistical procedures are no longer needed to publish in the journal, only “strong descriptive statistics”. Maybe to keep in tune with the “Basic” in the name of the journal!

“In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval.”

The above quote could be a motivation for a Bayesian approach to the testing problem, a revolutionary stance for journal editors!, but it only illustrate that the editors wish for a procedure that would eliminate the uncertainty inherent to statistical inference, i.e., to decision making under… erm, uncertainty: “The state of the art remains uncertain.” To fail to separate significance from certainty is fairly appalling from an epistemological perspective and should be a case for impeachment, were any such thing to exist for a journal board. This means the editors cannot distinguish data from parameter and model from reality! Even more fundamentally, to bar statistical procedures from being used in a scientific study is nothing short of reactionary. While encouraging the inclusion of data is a step forward, restricting the validation or in-validation of hypotheses to gazing at descriptive statistics is many steps backward and does completely jeopardize the academic reputation of the journal, which editorial may end up being the last quoted paper. Is deconstruction now reaching psychology journals?! To quote from a critic of this approach, “Thus, the general weaknesses of the deconstructive enterprise become self-justifying. With such an approach I am indeed not sympathetic.” (Searle, 1983).

“The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist (…) With respect to Bayesian procedures, we reserve the right to make case-by-case judgments, and thus Bayesian procedures are neither required nor banned from BASP.”

The section of Bayesian approaches is trying to be sympathetic to the Bayesian paradigm but again reflects upon the poor understanding of the authors. By “Laplacian assumption”, they mean Laplace´s Principle of Indifference, i.e., the use of uniform priors, which is not seriously considered as a sound principle since the mid-1930’s. Except maybe in recent papers of Trafimow. I also love the notion of “generat[ing] numbers when none exist”, as if the prior distribution had to be grounded in some physical reality! Although it is meaningless, it has some poetic value… (Plus, bringing Popper and Fisher to the rescue sounds like shooting Bayes himself in the foot.)  At least, the fact that the editors will consider Bayesian papers in a case-by-case basis indicate they may engage in a subjective Bayesian analysis of each paper rather than using an automated p-value against the 100% rejection bound!

[Note: this entry was suggested by Alexandra Schmidt, current ISBA President, towards an incoming column on this decision of Basic and Applied Social Psychology for the ISBA Bulletin.]

 

Philosophy of Science, a very short introduction (and review)

Posted in Books, Kids, Statistics, Travel with tags , , , , , , , , , , , on November 3, 2013 by xi'an

When visiting the bookstore on the campus of the University of Warwick two weeks ago, I spotted this book, Philosophy of Science, a very short introduction, by Samir Okasha, and the “bargain” offer of getting two books for £10 enticed me to buy it along with a Friedrich Nietzsche, a very short introduction… (Maybe with the irrational hope that my daughter would take a look at those for her philosophy course this year!)

Popper’s attempt to show that science can get by without induction does not succeed.” (p.23)

Since this is [unsusrprisingly!] a very short introduction, I did not get much added value from the book. Nonetheless, it was an easy read for short trips in the metro and short waits here and there. And would be a good [very short] introduction to any one newly interested in the philosophy of sciences. The first chapter tries to define what science is, with reference to the authority of Popper (and a mere mention of Wittgenstein), and concludes that there is no clear-cut demarcation between science and pseudo-science. (Mathematics apparently does not constitute a science: “Physics is the most fundamental science of all”, p.55) I would have liked to see the quote from Friedrich Nietzsche

“It is perhaps just dawning on five or six minds that physics, too, is only an interpretation and exegesis of the world (to suit us, if I may say so!) and not a world-explanation.”

in Beyond Good and Evil. as it illustrates the main point of the chapter and maybe the book that scientific theories can never be proven true, Plus, it is often misinterpreted as a anti-science statement by Nietzsche. (Plus, it links both books I bought!) Continue reading

simulating Nature

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , on July 25, 2012 by xi'an

This book, Simulating Nature: A Philosophical Study of Computer-Simulation Uncertainties and Their Role in Climate Science and Policy Advice, by Arthur C. Petersen, was sent to me twice by the publisher for reviewing it for CHANCE. As I could not find a nearby “victim” to review the book, I took it with me to Australia and read it by bits and pieces along the trip.

“Models are never perfectly reliable, and we are always faced with ontic uncertainty and epistemic uncertainty, including epistemic uncertainty about ontic uncertainty.” (page 53)

The author, Arthur C. Petersen, was a member of the United Nations’ Intergovernmental Panel on Climate Change (IPCC) and works as chief scientist at the PBL Netherlands Environmental Assessment Agency. He mentions that the first edition of this book, Simulating Nature, has achieved some kind of cult status, while being now out of print,  which is why he wrote this second edition. The book centres on the notion of uncertainty connected with computer simulations in the first part (pages 1-94) and on the same analysis applied to the simulation of climate change, based on the experience of the author, in the second part (pages 95-178). I must warn the reader that, as the second part got too focussed and acronym-filled for my own taste, I did not read it in depth, even though the issues of climate change and of the human role in this change are definitely of interest to me. (Readers of CHANCE must also realise that there is very little connection with Statistics in this book or my review of it!) Note that the final chapter is actually more of a neat summary of the book than a true conclusion, so a reader eager to get an idea about the contents of the book can grasp them through the eight pages of the eighth chapter.

“An example of the latter situation is a zero-dimensional (sic) model that aggregates all surface temperatures into a single zero-dimensional (re-sic) variable of globally averaged surface temperature.” (page 41)

The philosophical questions of interest therein are that a computer simulation of reality is not reproducing reality and that the uncertainty(ies) pertaining to this simulation cannot be assessed in its (their) entirety. (This the inherent meaning of the first quote, epistemic uncertainty relating to our lack of knowledge about the genuine model reproducing Nature or reality…) The author also covers the more practical issue of the interface between scientific reporting and policy making, which reminded me of Christl Donnelly’s talk at the ASC 2012 meeting (about cattle epidemics in England). The book naturally does not bring answers to any of those questions, naturally because a philosophical perspective should consider different sides of the problem, but I find it more interested in typologies and classifications (of types of uncertainties, in crossing those uncertainties with panel attitudes, &tc.) than in the fundamentals of simulation. I am obviously incompetent in the matter, however, as a naïve bystander, it does not seem to me that the book makes any significant progress towards setting epistemological and philosophical foundations for simulation. The part connected with the author’s implication in the IPCC shed more light on the difficulties to operate in committees and panels made of members with heavy political agendas than on the possible assessments of uncertainties within the models adopted by climate scientists…With the same provision as above, the philosophical aspects do not seem very deep: the (obligatory?!) reference to Karl Popper does not bring much to the debate, because what is falsification to simulation? Similarly, Lakatos’ prohibition of “direct[ing] the modus tollens at [the] hard core” (page 40) does not turn into a methodological assessment of simulation praxis.

“I argue that the application of statistical methods is not sufficient for adequately dealing with uncertainty.” (page 18)

“I agree (…) that the theory behind the concepts of random and systematic errors is purely statistical and not related to the locations and other dimensions of uncertainty.” (page 55)

Statistics is mostly absent from the book, apart from the remark that statistical uncertainty (understood as the imprecision induced by a finite amount of data) differs from modelling errors (the model is not reality), which the author considers cannot be handled by statistics (stating that Deborah Mayo‘s theory of statistical error analysis cannot be extended to simulation, see the footnote on page 55). [In other words, this book has no connection with Monte Carlo Statistical Methods! With or without capitals… Except for a mention of `real’ random number generators on—one of many—footnotes on page 35.]  Mention is made of “subjective probabilities” (page 54), presumably meaning a Bayesian perspective. But the distinction between statistical uncertainty and scenario uncertainty which “cannot be adequately described in terms of chances or probabilities” (page 54) misses the Bayesian perspective altogether, as does the following sentence that “specifying a degree of probability or belief [in such uncertainties] is meaningless since the mechanism that leads to the events are not sufficiently known” (page 54).

“Scientists can also give their subjective probability for a claim, representing their estimated chance that the claim is true. Provided that they indicate that their estimate for the probability is subjective, they are then explicitly allowing for the possibility that their probabilistic claim is dependent on expert judgement and may actually turn out to be false.” (page 57)

In conclusion, I fear the book does not bring enough of a conclusion on the philosophical justifications of using a simulation model instead of the actual reality and on the more pragmatic aspects of validating/invalidating a computer model and of correcting its imperfections with regards to data/reality. I am quite conscious that this is an immensely delicate issue and that, were it to be entirely solved, the current level of fight between climate scientists and climatoskeptics would not persist. As illustrated by the “Sound Science debate” (pages 68-70), politicians and policy-makers are very poorly equipped to deal with uncertainty and even less with decision under uncertainty. I however do not buy the (fuzzy and newspeak) concept of “post-normal science” developed in the last part of Chapter 4, where the scientific analysis of a phenomenon is abandoned for decision-making, “not pretend[ing] to be either value-free or ethically neutral” (page 75).

Error and Inference [#4]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on September 21, 2011 by xi'an

(This is the fourth post on Error and Inference, again and again yet being a raw and naïve reaction following a linear and slow reading of the book, rather than a deeper and more informed criticism.)

‘The defining feature of an inductive inference is that the premises (evidence statements) can be true while the conclusion inferred may be false without a logical contradiction: the conclusion is “evidence transcending”.”—D. Mayo and D. Cox, p.249, Error and Inference, 2010

The seventh chapter of Error and Inference, entitled “New perspectives on (some old) problems of frequentist statistics“, is divided in four parts, written by David Cox, Deborah Mayo and Aris Spanos, in different orders and groups of authors. This is certainly the most statistical of all chapters, not a surprise when considering that David Cox is involved, and I thus have difficulties to explain why it took me so long to read through it…. Overall, this chapter is quite important by its contribution to the debate on the nature of statistical testing.

‘The advantage in the modern statistical framework is that the probabilities arise from defining a probability model to represent the phenomenon of interest. Had Popper made use of the statistical testing ideas being developed at around the same time, he might have been able to substantiate his account of falsification.”—D. Mayo and D. Cox, p.251, Error and Inference, 2010

The first part of the chapter is Mayo’s and Cox’ “Frequentist statistics as a theory of inductive inference“. It was first published in the 2006 Erich Lehmann symposium. And available on line as an arXiv paper. There is absolutely no attempt there to link of clash with the Bayesian approach, this paper is only looking at frequentist statistical theory as the basis for inductive inference. The debate therein about deducing that H is correct from a dataset successfully facing a statistical test is classical (in both senses) but I [unsurprisingly] remain unconvinced by the arguments. The null hypothesis remains the calibrating distribution throughout the chapter, with very little (or at least not enough) consideration of what happens when the null hypothesis does not hold.  Section 3.6 about confidence intervals being another facet of testing hypotheses is representative of this perspective. The p-value is defended as the central tool for conducting hypothesis assessment. (In this version of the paper, some p’s are written in roman characters and others in italics, which is a wee confusing until one realises that this is a mere typo!)  The fundamental imbalance problem, namely that, in contiguous hypotheses, a test cannot be expected both to most often reject the null when it is [very moderately] false and to most often accept the null when it is right is not discussed there. The argument about substantive nulls in Section 3.5 considers a stylised case of well-separated scientific theories, however the real world of models is more similar to a greyish  (and more Popperian?) continuum of possibles. In connection with this, I would have thought more likely that the book would address on philosophical grounds Box’s aphorism that “all models are wrong”. Indeed, one (philosophical?) difficulty with the p-values and the frequentist evidence principle (FEV) is that they rely on the strong belief that one given model can be exact or true (while criticising the subjectivity of the prior modelling in the Bayesian approach). Even in the typology of types of null hypotheses drawn by the authors in Section 3, the “possibility of model misspecification” is addressed in terms of the low power of an omnibus test, while agreeing that “an incomplete probability specification” is unavoidable (an argument found at several place in the book that the alternative cannot be completely specified).

‘Sometimes we can find evidence for H0, understood as an assertion that a particular discrepancy, flaw, or error is absent, and we can do this by means of tests that, with high probability, would have reported a discrepancy had one been present.”—D. Mayo and D. Cox, p.255, Error and Inference, 2010

The above quote relates to the Failure and Confirmation section where the authors try to push the argument in favour of frequentist tests one step further, namely that that “moderate p-values” may sometimes be used as confirmation of the null. (I may have misunderstood, the end of the section defending a purely frequentist, as in repeated experiments, interpretation. This reproduces an earlier argument about the nature of probability in Section 1.2, as characterising the “stability of relative frequencies of results of repeated trials”) In fact, this chapter and other recent readings made me think afresh about the nature of probability, a debate that put me off so much in Keynes (1921) and even in Jeffreys (1939). From a mathematical perspective, there is only one “kind” of probability, the one defined via a reference measure and a probability, whether it applies to observations or to parameters. From a philosophical perspective, there is a natural issue about the “truth” or “realism” of the probability quantities and of the probabilistic statements. The book and in particular the chapter consider that a truthful probability statement is the one agreeing with “a hypothetical long-run of repeated sampling, an error probability”, while the statistical inference school of Keynes (1921), Jeffreys (1939), and Carnap (1962) “involves quantifying a degree of support or confirmation in claims or hypotheses”, which makes this (Bayesian) sound as less realistic… Obviously, I have no ambition to solve this long-going debate, however I see no reason in the first approach to be more realistic by being grounded on stable relative frequencies à la von Mises. If nothing else, the notion that a test should be evaluated on its long run performances is very idealistic as the concept relies on an ever-repeating, an infinite sequence of identical trials. Relying on probability measures as self-coherent mathematical measures of uncertainty carries (for me) as much (or as less) reality as the above infinite experiment. Now, the paper is not completely entrenched in this interpretation, when it concludes that “what makes the kind of hypothetical reasoning relevant to the case at hand is not the long-run low error rates associated with using the tool (or test) in this manner; it is rather what those error rates reveal about the data generating source or phenomenon” (p.273).

‘If the data are so extensive that accordance with the null hypothesis implies the absence of an effect of practical importance, and a reasonably high p-value is achieved, then it may be taken as evidence of the absence of an effect of practical importance.”—D. Mayo and D. Cox, p.263, Error and Inference, 2010

The paper mentions several times conclusions to be drawn from a p-value near one, as in the above quote. This is an interpretation that does not sit well with my understanding of p-values being distributed as uniforms under the null: very high  p-values should be as suspicious as very low p-values. (This criticism is not new, of course.) Unless one does not strictly adhere to the null model, which brings back the above issue of the approximativeness of any model… I also found fascinating to read the criticism that “power appertains to a prespecified rejection region, not to the specific data under analysis” as I thought this equally applied to the p-values, turning “the specific data under analysis” into a departure event of a prespecified kind.

(Given the unreasonable length of the above, I fear I will continue my snailpaced reading in yet another post!)

Follow

Get every new post delivered to your Inbox.

Join 909 other followers