## calibrating approximate credible sets

Posted in Books, Statistics with tags , , , , , , , on October 26, 2018 by xi'an

Earlier this week, Jeong Eun Lee, Geoff Nicholls, and Robin Ryder arXived a paper on the calibration of approximate Bayesian credible intervals. (Warning: all three authors are good friends of mine!) They start from the core observation that dates back to Monahan and Boos (1992) of exchangeability between θ being generated from the prior and φ being generated from the posterior associated with one observation generated from the prior predictive. (There is no name for this distribution, other than the prior, that is!) A setting amenable to ABC considerations! Actually, Prangle et al. (2014) relies on this property for assessing the ABC error, while pointing out that the test for exchangeability is not fool-proof since it works equally for two generations from the prior.

“The diagnostic tools we have described cannot be “fooled” in quite the same way checks based on the exchangeability can be.”

The paper thus proposes methods for computing the coverage [under the true posterior] of a credible set computed using an approximate posterior. (I had to fire up a few neurons to realise this was the right perspective, rather than the reverse!) A first solution to approximate the exact coverage of the approximate credible set is to use logistic regression, instead of the exact coverage, based on some summary statistics [not necessarily in an ABC framework]. And a simulation outcome that the parameter [simulated from the prior] at the source of the simulated data is within the credible set. Another approach is to use importance sampling when simulating from the pseudo-posterior. However this sounds dangerously close to resorting to an harmonic mean estimate, since the importance weight is the inverse of the approximate likelihood function. Not that anything unseemly transpires from the simulations.

## X-Outline of a Theory of Statistical Estimation

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 23, 2017 by xi'an

While visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.

“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”

“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionally contradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “

This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).

“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”

“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”

It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.

“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”

The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.

## SAS on Bayes

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , on November 8, 2016 by xi'an

Following a question on X Validated, I became aware of the following descriptions of the pros and cons of Bayesian analysis, as perceived by whoever (Tim Arnold?) wrote SAS/STAT(R) 9.2 User’s Guide, Second Edition. I replied more specifically on the point

It [Bayesian inference] provides inferences that are conditional on the data and are exact, without reliance on asymptotic approximation. Small sample inference proceeds in the same manner as if one had a large sample. Bayesian analysis also can estimate any functions of parameters directly, without using the “plug-in” method (a way to estimate functionals by plugging the estimated parameters in the functionals).

which I find utterly confusing and not particularly relevant. The other points in the list are more traditional, except for this one

It provides interpretable answers, such as “the true parameter θ has a probability of 0.95 of falling in a 95% credible interval.”

that I find somewhat unappealing in that the 95% probability has only relevance wrt to the resulting posterior, hence has no absolute (and definitely no frequentist) meaning. The criticisms of the prior selection

It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate subjective prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.

It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.

are traditional but nonetheless irksome. Once acknowledged there is no correct or true prior, it follows naturally that the resulting inference will depend on the choice of the prior and has to be understood conditional on the prior, which is why the credible interval has for instance an epistemic rather than frequentist interpretation. There is also little reason for trying to convince a fellow Bayesian statistician about one’s prior. Everything is conditional on the chosen prior and I see less and less why this should be an issue.

## checking ABC convergence via coverage

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on January 24, 2013 by xi'an

Dennis Prangle, Michael Blum, G. Popovic and Scott Sisson just arXived a paper on diagnostics for ABC validation via coverage diagnostics. Getting valid approximation diagnostics for ABC is clearly and badly needed and this was the last slide of my talk yesterday at the Winter Workshop in Gainesville. When simulation time is not an issue (!), our DIYABC software does implement a limited coverage assessment by computing the type I error, i.e. by simulating data under the null model and evaluating the number of time it is rejected at the 5% level (see sections 2.11.3 and 3.8 in the documentation). The current paper builds on a similar perspective.

The idea in the paper is that a (Bayesian) credible interval at a given credible level α should have a similar confidence level (at least asymptotically and even more for matching priors) and that simulating pseudo-data with a known parameter value allows for a Monte-Carlo evaluation of the credible interval “true” coverage, hence for a calibration of the tolerance. The delicate issue is about the generation of those “known” parameters. For instance, if the pair (θ, y) is generated from the joint distribution prior x likelihood, and if the credible region is also based on the true posterior, the average coverage is the nominal one. On the other hand, if the credible interval is based on a poor (ABC) approximation to the posterior, the average coverage should differ from the nominal one. Given that ABC is always wrong, however, this may fail to be a powerful diagnostic. In particular, when using insufficient (summary) statistics, the discrepancy should make testing for uniformity harder, shouldn’t it?  Continue reading

## adjustment of bias and coverage for confidence intervals

Posted in Statistics with tags , , , , , , , , on October 18, 2012 by xi'an

Menéndez, Fan, Garthwaite, and Sisson—whom I heard in Adelaide on that subject—posted yesterday a paper on arXiv about correcting the frequentist coverage of default intervals toward their nominal level. Given such an interval [L(x),U(x)], the correction for proper frequentist coverage is done by parametric bootstrap, i.e. by simulating n replicas of the original sample from the pluggin density f(.|θ*) and deriving the empirical cdf of L(y)-θ*. And of U(y)-θ*. Under the assumption of consistency of the estimate θ*, this ensures convergence (in the original sampled size) of the corrected bounds.

Since ABC is based on the idea that pseudo data can be simulated from f(.|θ) for any value of θ, the concept “naturally” applies to ABC outcomes, as illustrated in the paper by a g-and-k noise MA(1) model. (As noted by the authors, there always is some uncertainty with the consistency of the ABC estimator.) However, there are a few caveats:

• ABC usually aims at approximating the posterior distribution (given the summary statistics), of which the credible intervals are an inherent constituent. Hence, attempts at recovering a frequentist coverage seem contradictory with the original purpose of the method. Obviously, if ABC is instead seen as an inference method per se, like indirect inference, this objection does not hold.
• Then, once the (umbilical) link with Bayesian inference is partly severed, there is no particular reason to stick to credible sets for [L(x),U(x)]. A more standard parametric bootstrap approach, based on the bootstrap distribution of θ*, should work as well. This means that a comparison with other frequentist methods like indirect inference could be relevant.
• At last, and this is also noted by the authors, the method may prove extremely expensive. If the bounds L(x) and U(x) are obtained empirically from an ABC sample, a new ABC computation must be associated with each one of the n replicas of the original sample. It would be interesting to compare the actual coverages of this ABC-corrected method with a more direct parametric bootstrap approach.

## testing via credible sets

Posted in Statistics, University life with tags , , , , , , , , , , , on October 8, 2012 by xi'an

Måns Thulin released today an arXiv document on some decision-theoretic justifications for [running] Bayesian hypothesis testing through credible sets. His main point is that using the unnatural prior setting mass on a point-null hypothesis can be avoided by rejecting the null when the point-null value of the parameter does not belong to the credible interval and that this decision procedure can be validated through the use of special loss functions. While I stress to my students that point-null hypotheses are very unnatural and should be avoided at all cost, and also that constructing a confidence interval is not the same as designing a test—the former assess the precision in the estimation, while the later opposes two different and even incompatible models—, let us consider Måns’ arguments for their own sake.

The idea of the paper is that there exist loss functions for testing point-null hypotheses that lead to HPD, symmetric and one-sided intervals as acceptance regions, depending on the loss func. This was already found in Pereira & Stern (1999). The issue with these loss functions is that they involve the corresponding credible sets in their definition, hence are somehow tautological. For instance, when considering the HPD set and T(x) as the largest HPD set not containing the point-null value of the parameter, the corresponding loss function is

$L(\theta,\varphi,x) = \begin{cases}a\mathbb{I}_{T(x)^c}(\theta) &\text{when }\varphi=0\\ b+c\mathbb{I}_{T(x)}(\theta) &\text{when }\varphi=1\end{cases}$

parameterised by a,b,c. And depending on the HPD region.

Måns then introduces new loss functions that do not depend on x and still lead to either the symmetric or the one-sided credible intervals.as acceptance regions. However, one test actually has two different alternatives (Theorem 2), which makes it essentially a composition of two one-sided tests, while the other test returns the result to a one-sided test (Theorem 3), so even at this face-value level, I do not find the result that convincing. (For the one-sided test, George Casella and Roger Berger (1986) established links between Bayesian posterior probabilities and frequentist p-values.) Both Theorem 3 and the last result of the paper (Theorem 4) use a generic and set-free observation-free loss function (related to eqn. (5.2.1) in my book!, as quoted by the paper) but (and this is a big but) they only hold for prior distributions setting (prior) mass on both the null and the alternative. Otherwise, the solution is to always reject the hypothesis with the zero probability… This is actually an interesting argument on the why-are-credible-sets-unsuitable-for-testing debate, as it cannot bypass the introduction of a prior mass on Θ0!

Overall, I furthermore consider that a decision-theoretic approach to testing should encompass future steps rather than focussing on the reply to the (admittedly dumb) question is θ zero? Therefore, it must have both plan A and plan B at the ready, which means preparing (and using!) prior distributions under both hypotheses. Even on point-null hypotheses.

Now, after I wrote the above, I came upon a Stack Exchange page initiated by Måns last July. This is presumably not the first time a paper stems from Stack Exchange, but this is a fairly interesting outcome: thanks to the debate on his question, Måns managed to get a coherent manuscript written. Great! (In a sense, this reminded me of the polymath experiments of Terry Tao, Timothy Gower and others. Meaning that maybe most contributors could have become coauthors to the paper!)

## loss functions for credible regions

Posted in Statistics, University life with tags , , , , on March 15, 2012 by xi'an

When Éric Marchand came to give a talk last week, we discussed about minimality and Bayesian estimation for confidence/credible regions. In the early 1990’s, George Casella and I wrote a paper in this direction, entitled “Distance weighted losses for testing and confidence set evaluation” and published in TEST. It was restricted to the univariate case but one could consider evaluating α-level confidence regions with a loss function like

$L(\theta,C) = \left(\theta-\text{proj}_C(\theta)\right)^2$

where the projection of the parameter over C is the element in C that is closest to the parameter. As in the original paper, this loss function brings a penalty of how far is the parameter from the region, compared the rudimentary 0-1 loss function which penalises all misses the same way. The posterior loss is not straightforward to minimise, though. Unless one considers an approximation based on a sample from the posterior and picks the (1-α)-fraction that gives the smallest sum of distances to the remaining α-fraction. And then takes a convexification of the α-fraction. This is not particularly “clean” and I would prefer to find an HPD-like region, i.e. an HPD linked to a modified prior… But this may require another loss function than the one above. Incidentally, I was also playing with an alternative loss function that would avoid setting the level α. Namely

$L(\theta,C) = \left(\theta-\text{proj}_C(\theta)\right)^2 + \tau\, \text{diam}(C)^2,$

which simultaneously penalises non-coverage and size. However, the choice of τ makes the function difficult to motivate in a realistic setting.