## the anti-Bayesian moment and its passing commented

**H**ere is a comment on our rejoinder “the anti-Bayesian moment and its passing” with Andrew Gelman from Deborah Mayo, comment that could not make it through as a comment:

You assume that I am interested in long-term average properties of procedures, even though I have so often argued that they are at most necessary (as consequences of good procedures), but scarcely sufficient for a severity assessment. The error statistical account I have developed is a statistical philosophy. It is not one to be found in Neyman and Pearson, jointly or separately, except in occasional glimpses here and there (unfortunately). It is certainly not about well-defined accept-reject rules. If N-P had only been clearer, and Fisher better behaved, we would not have had decades of wrangling. However, I have argued, the error statistical philosophy explicates, and directs the interpretation of, frequentist sampling theory methods in scientific, as opposed to behavioural, contexts. It is not a complete philosophy…but I think Gelmanian Bayesians could find in it a source of “standard setting”.

You say “the prior is both a probabilistic object, standard from this perspective, and a subjective construct, translating qualitative personal assessments into a probability distribution. The extension of this dual nature to the so-called “conventional” priors (a very good semantic finding!) is to set a reference … against which to test the impact of one’s prior choices and the variability of the resulting inference. …they simply set a standard against which to gauge our answers.”

I think there are standards for even an approximate meaning of “standard-setting” in science, and I still do not see how an object whose meaning and rationale may fluctuate wildly, even in a given example, can serve as a standard or reference. For what?

Perhaps the idea is that one can gauge how different priors change the posteriors, because, after all, the likelihood is well-defined. That is why the prior and not the likelihood is the camel. But it isn’t obvious why I should want the camel. (camel/gnat references in the paper and response).

March 12, 2013 at 7:40 am

Apologies for the assumption that you were “interested in long-term average properties of procedures”! For me, this sounded like the basic tenet of a frequentist credo, i.e. the validation of the choice of a procedure by its long-term properties, e.g. coverage. This is von Mises’ view on (frequentist) statistics, as far as I can judge (I could not finish his book!)…

March 12, 2013 at 8:11 am

I meant “interested only in long-term properties”. They are not the sufficient to validate the choice of a procedure..they may not yield severity for example.

March 12, 2013 at 8:31 am

Ah, stay tuned: more about severity in the comments on Aris Spanos’ “Who should be afraid of the Jeffreys-Lindley paradox?”, tomorrow on the ‘Og!

March 12, 2013 at 8:13 am

thanks for the pics.

March 12, 2013 at 4:16 am

“You assume that I am interested in long-term average properties of procedures, even though I have so often argued that they are at most necessary”

But those long term average properties aren’t even close to being necessary.

To get those “long-term average properties” you have to assume something like a “data generating mechanism” which has some stable relation between past and future. This is an enormously strong assumption about the universe, which generally doesn’t hold and is absolutely uncheckable before the fact. No matter how much the relationship has held in the past it can fail at any moment, thereby completely destroying those “long-term average properties”.

The Bayesian inference, however, rests on much weaker assumptions, which can definitely be known to be true when the inference is made. Inferences involving errors rest on partial, but true and known, information about the real errors in the actual data taken. Sure the Baysian loses those “long-term average properties” but they were a complete fantasy anyway, and in turn they gain reasonable guesses, based on objectively true knowledge, about things that actually happened.

For example, if we take measurements with a ruler with divisions down to centimeter level, then we know from eyeballing the ruler we can get errors on the order of magnitude of 1mm. This is actual knowledge about the errors and not some fantasy assumption. So if we assume a distribution of IID normal with N(0,1mm) we’re guaranteed the actual vector of errors in the data is in the high probability manifold of this distribution. That’s enough to get good inferences using this distribution and it’s completely irrelevant whether the past, present, or future errors with this ruler have a histogram that looks line N(0,1mm).

March 12, 2013 at 8:07 am

Yes, it’s all about knowledge of the properties of a measuring rule in that example, and requires that the measurement process have some stable properties to tell from a reading approximately how much you weigh. Eye balling will rarely do. Blending the reading with my wishes and hopes will not do either.I really fail utterly to see where you think it is frequentists who make assumptions about the past/future. We CAN check whether we have applied a tool correctly for the case at hand, and we improve skills by deliberately falsifying models and getting good at detecting them. If there are any complete fantasies here it’s the Bayesians. Assumptions about having priors is scarcely weak, scarcely checkable, and of very questionable relevance to finding things out. You seem to think that just believing it is so makes it so—you believe it’s in the high probability manifold so it is. And by :good inferences: you must mean those judged good by the same a priori assumptions.

You have not pointed out a single fantasy in my view, and your repeating for the nth time the same thoughtless, unjustified roster of criticisms demonstrates your unwillingness to truly grapple with any of the accusations yourself..

March 12, 2013 at 1:54 pm

Dr. Mayo,

A probability distribution P(x) is “good” if the true value of x is in the high probability region of P. This is true for sampling or priors. Any strategy whereby you can arrange to only work with distributions that have this property will lead to good inferences.

For example, the measurement errors might be .8mm, .8mm, .8mm, -.8mm, -.8mm, -.8mm. Obviously we don’t know these values when making inferences, but we do know that the each error is on the order of magnitude of 1mm or less. That knowledge is enough for us to say that the vector of errors will be in the high probability region of IID N(0,1mm).

Similarly with the prior, I may know the object I’m measuring has positive length but is less than a football field. So if we use the prior N(0,100m) it’s guaranteed that the true length is in the high probability region of the prior.

Now from a Frequentist view, this is completely insane. The prior is just made up nonsense, and the errors aren’t even close to being random, independent or normal. But both the sampling distribution and the prior satisfy the definition of a “good” distribution by construction and so will work. In other words, the Bayesian posterior intervals will contain the true length at pretty much any level of alpha!

I stress that’s what being said is mathematical. You can keep ignoring the point if you want, but it’s still just as mathematically true. The Bayesian used genuine knowledge (both objectively true and known) to construct two distributions which ultimately identified a correct range for the length. The Frequentist assumption that the present and future errors have a histogram that looks like N(0,1mm) and that repeated replications of this process will yield (1-alpha)% coverage is a complete fantasy.

March 12, 2013 at 3:56 am

Gee, and here I was thinking “it is just as well he lost my late-night comment, which I probably shouldn’t have sent.” If I’d known you would turn it into a post I would have sent you a pic with me on a camel in Egypt. Instead I put it on my blog today: errorstatistics.com

March 12, 2013 at 7:34 am

Here’s the camel!

March 12, 2013 at 7:38 am

You? You probably wouldn’t enjoy carrying me around in the heat of Egypt’s sand dunes, but I have a funny story about the camels and camel keepers that I should tell some time….

March 12, 2013 at 12:43 am

(recognising that I’m coming into this without any amount of experience in statistical philosophy and, therefore, almost certainly re-treading a very tired argument. But still…)

In what way is a likelihood well-defined. I mean, as someone with something that under weak inspection would appear to be a mathematical background, i can probably enumerate the properties that a function needs to have in order to be a “likelihood”, but philosophically (which is the object of the paper/comments/rejoinder/comment) I don’t see how that helps. Practically (and I like to pretend that I am a ‘pragmatic Bayesian’, whatever that means), the likelihood for me represents the totality of *conveniently representable* knowledge of the observation process. [For completeness, the priors are an awkward melange of preconceived notions, defaults, and stuff I had to do to make the estimation work] I fail to see how, in this form, the likelihood has any particular ‘purity’. And I struggle to see how, for reasonably complex problems, the likelihood can be in any way more ‘objective’ than this.

Maybe the comment is arguing against statistical modelling as a whole? Because the (not unreasonable, but maybe impractical) argument around “standards” surely applies to “likelihood” concepts like “sampling”, “outliers”, “missing data” etc.

I guess frequentist methods [from "guess&check + bootstrap" to full theory] avoid this, but maybe at a loss of interpretability.

March 12, 2013 at 12:46 am

For super-completeness, for the problems that I am most interested in we must battle with floating point arithmetic to get even a scintilla of inference done, so the idea of using priors to “just make things work” is not totally anathema to science [as far as I'm concerned]. The other option is not being able to do anything.