My apologies: I mean “min” not “max”. In the paper “Error Statsitcs” Dr. Mayo uses a “min severity” rule. See for example:

“How do we calculate

when μ ≤ .2 is a composite claim? We need only to calculate it for the point

μ=.2 because μ values less than .2 would yield an even higher SEV value."

In reference to Cox’s Theorem note that from Mayo’s Error Statistics paper:

SEV(H)+SEV(not H)=1

Thus the sum rule is already satisfied. So Dr. Mayo has to do something like:

SEV(H,J)=max{SEV(H),SEV(J)}

Because if she ever identifies a “conditional severity” and writes

SEV(H,J)=SEV(H:J)SEV(J)

Then she is for all practical purposes assigning Bayesian style probabilities to hypothesis, which she dogmatically insists cannot be done.

I get the feeling that she’d have liked to avoided this altogether by never defining severity for a composite hypothesis, but that would make the concept useless. So she picked the very special example of IID normals, and then took the severity of the composite to be the maximum of each severity individually.

In that case, the answer is the same as the Bayesian P(mu>mu1:z0), so the resulting numbers appear to correct all the problems with p-values. However that doesn’t mean that SEV is the same as a posterior probability. The difference arises because they follow different rules of composition.

Consider H=”mu>mu1″ and H’ = “mu1+10^(-1000)>mu>mu1”.

According to her “maximum rule” in the example in which this is drawn from in her “Error Statistics” paper:

SEV(H)=SEV(H’).

I respectfully submit that H and H’ have NOT been tested with the same severity.

]]>The P(d(Z)>d(z0): mu>mu1) is effectively defined as P(d(Z)>d(z0):mu1) because the value mu1 leads to a maximum “severity”. Of course this dodge only works because of the nice problem she chooses (i.e. exponential family distributions).

This is actually a big problem for Mayo, but she’s so dogmatically sure of her philosophy that she won’t look at the technical details long enough to see why.

The problem comes from Cox’s Theorem (made famous by Jaynes). The hypothesis mu>mu1 is a compound hypothesis. So according to Cox’s theorem you should handle the composite hypothesis using the product rule [A&B]=[A:B][B] or you’ll run into absurdities (an “absurdity” is as defined by the state of the theorem itself). Taking [A&B] = max{ [A],[B] } violates this rule and it’s not hard to think of examples where this is clearly wrong. (Mayo simply denies that Cox’s theorem applies apparently unaware that it is a statement of mathematics and not philosophy).

So the problem for Mayo is that she can’t widely apply the severity concept to non-trivial real problems. If she does, the absurdities of using [A&B] = max{ [A],[B] } will become apparent. Either she or someone else, will then want to patch things up to remove the problems. But once you patch them up, it will bring the whole analysis ever closer to a Bayesian statistics (via the magic of Cox’s theorem).

It’s already much closer than she thinks, because she dodged this problem in the example above by reducing her calculation to the equivalent of the Bayesian P(mu>mu1: z0). As long as you don’t stray too far, the resulting numbers (unsurprisingly) seem like they remove the flaws of classical p-value type statistics.

]]>For the example given in the paper “Error Statistics”, this integral is exactly the same as the integral used to compute the Bayesian posterior P(mu>mu1: z0) using a uniform prior for mu. Just use a change of variables to transform the Bayesian integral into the one used to compute P(d(Z)>d(z0):mu>mu1).

]]>which is not defined from a frequentist perspective? (I should not have used *conditioning* there as this is a conditioning only from a Bayesian perspective.) Or the remarks about conditioning upon ancillary statistics and the in-sufficience of the conditioning ancillary statistic distribution to signal departures from the null?