Error and Inference [#3]

(This is the third post on Error and Inference, yet again being a raw and naïve reaction to a linear reading rather than a deeper and more informed criticism.)

“Statistical knowledge is independent of high-level theories.”—A. Spanos, p.242, Error and Inference, 2010

The sixth chapter of Error and Inference is written by Aris Spanos and deals with the issues of testing in econometrics. It provides on the one hand a fairly interesting entry in the history of economics and the resistance to data-backed theories, primarily because the buffers between data and theory are multifold (“huge gap between economic theories and the available observational data“, p.203). On the other hand, what I fail to understand in the chapter is the meaning of theory, as it seems very distinct from what I would call a (statistical) model. The sentence “statistical knowledge, stemming from a statistically adequate model allows data to `have a voice of its own’ (…) separate from the theory in question and its succeeds in securing the frequentist goal of objectivity in theory testing” (p.206) is puzzling in this respect. (Actually, I would have liked to see a clear meaning put to this “voice of its own”, as it otherwise sounds mostly as a catchy sentence…) Similarly, Spanos distinguishes between three types of models: primary/theoretical, experimental/structural: “the structural model contains a theory’s substantive subject matter information in light of the available data” (p.213), data/statistical: “the statistical model is built exclusively using the information contained in the data” (p.213). I have trouble to understand how testing can distinguish between those types of models: as a naïve reader, I would have thought that only the statistical model could be tested by a statistical procedure, even though I would not call the above a proper definition of a statistical model (esp. since Spanos writes a few lines below that the statistical model “would embed (nest) the structural model in its context” (p.213)). The normal example followed on pages 213-217 does not help [me] to put sense to this distinction: it simply illustrates the impact of failing some of the defining assumptions (normality, time homogeneity [in mean and variance], independence). (As an aside, the discussion about the poor estimation of the correlation p.214-215 does not help, because it involves a second variable Y that is not defined for this example.) It would be nice of course if the “noise” in a statistical/econometric model could be studied in complete separation from the structure of this model, however they seem to be irremediably intermingled to prevent this partition of roles. I thus do not see how the “statistically adequate model is independent from the substantive information” (p.217), i.e. by which rigorous process one can isolate the “chance” parts of the data to build and validate a statistical model per se. The simultaneous equation model (SEM, pp.230-231) is more illuminating of the distinction set by Spanos between structural and statistical models/parameters, even though the difference in this case boils down to a question of identifiability.

“What is needed is a methodology of error inquiry that encourages detection and identification of the different ways an inductive inference could be in error by applying effective procedures that would detect such errors when present with very high probability.”—A. Spanos, p.241, Error and Inference, 2010

The chapter, in line with the book, is strongly entrenched within the “F-N-P” frequentist paradigm” (p.210). Obviously, there are major differences between the Fisherian and the Neyman-Pearson approaches to testing that are not addressed in the chapter, the main opposition being the role (or non-role) of the p-value. The recurrent (and relevant) worry of Spanos about model misspecification is not directly addressed by either of those. The extremely strong criticisms of “cookbook” econometrics textbooks (p.233) could thus be equally addressed to most statistics books and papers: I do not see how the “error statistical perspective” could be able to spot all departures from model assumptions. Section 6.3 comparing covariate dependent models à la Cowles Commission with standard autoregressive models is thus puzzling because (a) they do not seem particularly comparable to me, for the very reason evacuated by Spanos that “ARIMA models ignore all substantive information”, and (b) time series models may be just as well misspecified. To think that adding a linear time-dependence to a regression model is sufficient to solve the issues, as argued by Spanos  (“…what distinguishes [this] approach from other more data-oriented traditions is its persistent emphasis on justifying the methodological foundations of its procedures using the scientific credentials of frequentist inference“, p.238), is a rather radical shortcut for a justification of the approach.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: