**E**dwin Fong and Chris Holmes (Oxford) just wrote a paper on Bayesian scalable methods from a M-open perspective. Borrowing from the conformal prediction framework of Vovk et al. (2005) to achieve frequentist coverage for prediction intervals. The method starts with the choice of a conformity measure that measures how well each observation in the sample agrees with the sample. Which is exchangeable and hence leads to a rank statistic from which a p-value can be derived. Which is the empirical cdf associated with the observed conformities. Following Vovk et al. (2005) and Wasserman (2011) Edwin and Chris note that the Bayesian predictive itself acts like a conformity measure. Predictive that can itself be approximated by MCMC and importance sampling (possibly smoothed by Pareto). The paper also expands the setting to partial exchangeable models, renamed group conformal predictions. While reluctant to engage into turning Bayesian solutions into frequentist ones, I can see some worth in deriving both in order to expose discrepancies and hence signal possible issues with models and priors.

## Archive for frequentist coverage

## Conformal Bayesian Computation

Posted in Books, pictures, Statistics, University life with tags Bayesian predictive, conformal prediction, frequentist coverage, p-value, Pareto smoothed importance sampling, University of Oxford on July 8, 2021 by xi'an## look, look, confidence! [book review]

Posted in Books, Statistics, University life with tags ABC, amazon associates, Bayesian foundations, BibTeX, book review, confidence distribution, confidence intervals, epistemic probability, fiducial distribution, frequentist coverage, Neyman-Scott problem, Nobel Prize, Norway, prior free posterior, Quenouille, survey, whales on April 23, 2018 by xi'an**A**s it happens, I recently bought [with Amazon Associate earnings] a (used) copy of Confidence, Likelihood, Probability (Statistical Inference with Confidence Distributions), by Tore Schweder and Nils Hjort, to try to understand this confusing notion of confidence distributions. (And hence did not get the book from CUP or anyone else towards purposely writing a review. Or a ½-review like the one below.)

“Fisher squared the circle and obtained a posterior without a prior.” (p.419)

Now that I have gone through a few chapters, I am no less confused about the point of this notion. Which seems to rely on the availability of confidence intervals. Exact or asymptotic ones. The authors plainly recognise (p.61) that a confidence distribution is neither a posterior distribution nor a fiducial distribution, hence cutting off any possible Bayesian usage of the approach. Which seems right in that there is no coherence behind the construct, meaning for instance there is no joint distribution corresponding to the resulting marginals. Or even a specific dominating measure in the parameter space. (Always go looking for the dominating measure!) As usual with frequentist procedures, there is always a feeling of arbitrariness in the resolution, as for instance in the Neyman-Scott problem (p.112) where the profile likelihood and the deviance do not work, but considering directly the distribution of the (inconsistent) MLE of the variance “saves the day”, which sounds a bit like starting from the solution. Another statistical freak, the Fieller-Creasy problem (p.116) remains a freak in this context as it does not seem to allow for a confidence distribution. I also notice an ambivalence in the discourse of the authors of this book, namely that while they claim confidence distributions are both outside a probabilisation of the parameter and inside, “producing distributions for parameters of interest given the data (…) with fewer philosophical and interpretational obstacles” (p.428).

“Bias is particularly difficult to discuss for Bayesian methods, and seems not to be a worry for most Bayesian statisticians.” (p.10)

The discussions as to whether or not confidence distributions form a synthesis of Bayesianism and frequentism always fall short from being convincing, the choice of (or the dependence on) a prior distribution appearing to the authors as a failure of the former approach. Or unnecessarily complicated when there are nuisance parameters. Apparently missing on the (high) degree of subjectivity involved in creating the confidence procedures. Chapter 1 contains a section on “Why not go Bayesian?” that starts from Chris Sims‘ Nobel Lecture on the appeal of Bayesian methods and goes [softly] rampaging through each item. One point (3) is recurrent in many criticisms of B and I always wonder whether or not it is tongue-in-cheek-y… Namely the fact that parameters of a model are rarely if ever stochastic. This is a misrepresentation of the use of prior and posterior distributions [which are in fact] as summaries of information cum uncertainty. About a true fixed parameter. Refusing as does the book to endow posteriors with an epistemic meaning (except for “Bayesian of the Lindley breed” (p.419) is thus most curious. (The debate is repeating in the final(e) chapter as “why the world need not be Bayesian after all”.)

“To obtain frequentist unbiasedness, the Bayesian will have to choose her prior with unbiasedness in mind. Is she then a Bayesian?” (p.430)

A general puzzling feature of the book is that notions are not always immediately defined, but rather discussed and illustrated first. As for instance for the central notion of fiducial probability (Section 1.7, then Chapter 6), maybe because Fisher himself did not have a general principle to advance. The construction of a confidence distribution most often keeps a measure of mystery (and arbitrariness), outside the rather stylised setting of exponential families and sufficient (conditionally so) statistics. (Incidentally, our 2012 ABC survey is [kindly] quoted in relation with approximate sufficiency (p.180), while it does not sound particularly related to this part of the book. Now, is there an ABC version of confidence distributions? Or an ABC derivation?) This is not to imply that the book is uninteresting!, as I found reading it quite entertaining, with many humorous and tongue-in-cheek remarks, like “From Fraser (1961a) and until Fraser (2011), and hopefully even further” (p.92), and great datasets. (Including one entitled *Pornoscope*, which is about *drosophilia* mating.) And also datasets with lesser greatness, like the 3000 mink whales that were killed for Example 8.5, where the authors if not the whales “are saved by a large and informative dataset”… (Whaling is a recurrent [national?] theme throughout the book, along with sport statistics usually involving Norway!)

Miscellanea: The interest of the authors in the topic is credited to bowhead whales, more precisely to Adrian Raftery’s geometric merging (or melding) of two priors and to the resulting Borel paradox (xiii). Proposal that I remember Adrian presenting in Luminy, presumably in 1994. Or maybe in Aussois the year after. The book also repeats Don Fraser’s notion that the likelihood is a sufficient statistic, a point that still bothers me. (On the side, I realised while reading Confidence, &tc., that ABC cannot comply with the likelihood principle.) To end up on a French nitpicking note (!), Quenouille is typ(o)ed Quenoille in the main text, the references and the index. (Blame the .bib file!)

## ACDC versus ABC

Posted in Books, Kids, pictures, Statistics, Travel with tags ABC, ACC, ACDC, Bayesian inference, frequentist coverage, Harvard University on June 12, 2017 by xi'an**A**t the Bayes, Fiducial and Frequentist workshop last month, I discussed with the authors of this newly arXived paper, Approximate confidence distribution computing, Suzanne Thornton and Min-ge Xie. Which they abbreviate as ACC and not as ACDC. While I have discussed the notion of confidence distribution in some earlier posts, this paper aims at producing proper frequentist coverage within a likelihood-free setting. Given the proximity with our recent paper on the asymptotics of ABC, as well as with Li and Fearnhead (2016) parallel endeavour, it is difficult (for me) to spot the actual distinction between ACC and ABC given that we also achieve (asymptotically) proper coverage when the limiting ABC distribution is Gaussian, which is the case for a tolerance decreasing quickly enough to zero (in the sample size).

“Inference from the ABC posterior will always be difficult to justify within a Bayesian framework.”

Indeed the ACC setting is eerily similar to ABC apart from the potential of the generating distribution to be data dependent. (Which is fine when considering that the confidence distributions have no Bayesian motivation but are a tool to ensure proper frequentist coverage.) That it is “able to offer theoretical support for ABC” (p.5) is unclear to me, given both this data dependence and the constraints it imposes on the [sampling and algorithmic] setting. Similarly, I do not understand how the authors “are not committing the error of doubly using the data” (p.5) and why they should be concerned about it, standing outside the Bayesian framework. If the prior involves the data as in the Cauchy location example, it literally *uses* the data [once], followed by an ABC comparison between simulated and actual data, that *uses* the data [a second time].

“Rather than engaging in a pursuit to define a moving target such as [a range of posterior distributions], ACC maintains a consistently clear frequentist interpretation (…) and thereby offers a consistently cohesive interpretation of likelihood-free methods.”

The frequentist coverage guarantee comes from a bootstrap-like assumption that [with tolerance equal to zero] the distribution of the ABC/ACC/ACDC random parameter around an estimate of the parameter *given* the summary statistic is identical to the [frequentist] distribution of this estimate around the true parameter [given the true parameter, although this conditioning makes no sense outside a Bayesian framework]. (There must be a typo in the paper when the authors define [p.10] the estimator as minimising the derivative of the density of the summary statistic, while still calling it an MLE.) That this bootstrap-like assumption holds is established (in Theorem 1) under a CLT on this MLE and assumptions on the data-dependent proposal that connect it to the density of the summary statistic. Connection that seem to imply a data-dependence as well as a certain knowledge about this density. What I find most surprising in this derivation is the total absence of conditions or even discussion on the tolerance level which, as we have shown, is paramount to the validation or invalidation of ABC inference. It sounds like the authors of Approximate confidence distribution computing are setting ε equal to zero for those theoretical derivations. While in practice they apply rules [for choosing ε] they do not voice out, but which result in very different acceptance rates for the ACC version they oppose to an ABC version. (In all illustrations, it seems that ε=0.1, which does not make much sense.) All in all, I am thus rather skeptical about the practical implications of the paper in that it seems to achieve confidence guarantees by first assuming proper if implicit choices of summary statistics and parameter generating distribution.

## Bayes, reproducibility and the Quest for Truth

Posted in Books, Statistics, University life with tags Bayesian foundations, frequency properties, frequentist coverage, L'Aquila, Statistical Science, truth on April 27, 2017 by xi'anDon Fraser, Mylène Bédard, and three coauthors have written a paper with the above dramatic title in Statistical Science about the reproducibility of Bayesian inference in the framework of what they call a mathematical prior. Connecting with the earlier quick-and-dirty tag attributed by Don to Bayesian credible intervals.

“We provide simple (…) counter-examples to general claims that Bayes can offer accuracy for statistical inference. To obtain this accuracy with Bayes, more effort is required compared to recent likelihood methods (…) [and] accuracy beyond first order is routinely not available (…) An alternative is to view default Bayes as an exploratory technique and then ask does it do as it overtly claims? Is it reproducible as understood in contemporary science? (…) No one has answers although speculative claims abound.” (p. 1)

The early stages of the paper questions the nature of a prior distribution in terms of objectivity and reproducibility, which strikes me as a return to older debates on the nature of probability. And of a dubious insistence on the reality of a prior when the said reality is customarily and implicitly assumed for the sampling distribution. While we “can certainly ask how [a posterior] quantile relates to the true value of the parameter”, I see no compelling reason why the associated quantile should be endowed with a frequentist coverage meaning, i.e., be more than a normative indication of the deviation from the true value. (Assuming there is such a parameter.) To consider that the credible interval of interest can be “objectively” assessed by simulation experiments evaluating its coverage is thus doomed from the start (since there is not reason for the nominal coverage) and situated on the wrong plane since it stems from the hypothetical frequentist model for a range of parameter values. Instead I find simulations from (generating) models useful in a general ABC sense, namely by producing realisations from the predictive one can assess at which degree of roughness the data is compatible with the formal construct. To bind reproducibility to the frequentist framework thus sounds wrong [to me] as being model-based. In other words, I do not find the definition of reproducibility used in the paper to be objective (literally bouncing back from Gelman and Hennig Read Paper)

At several points in the paper, the legal consequences of using a subjective prior are evoked as legally binding and implicitly as dangerous. With the example of the L’Aquila expert trial. I have trouble seeing the relevance of this entry as an adverse lawyer is as entitled to attack the expert on her or his sampling model. More fundamentally, I feel quite uneasy about bringing this type of argument into the debate!

## Bayes, reproducibility, and the quest for truth

Posted in Books, Kids, Statistics, University life with tags accuracy, all models are wrong, Bayes(Pharma), expertise, frequentist coverage, L'Aquila, legal statistics, magnitude, non-reproducible research, Richter scale on September 2, 2016 by xi'an

“Avoid opinion priors, you could be held legally or otherwise responsible.”

**D**on Fraser, Mylène Bedard, Augustine Wong, Wei Lin, and Ailana Fraser wrote a paper to appear in Statistical Science, with the above title. This paper is a continuation of Don’s assessment of Bayes procedures in earlier Statistical Science [which I discussed] and Science 2013 papers, which I would qualify with all due respect of a demolition enterprise [of the Bayesian approach to statistics]… The argument therein is similar in that “reproducibility” is to be understood therein as providing frequentist confidence assessment. The authors also use “accuracy” in this sense. (As far as I know, there is no definition of *reproducibility* to be found in the paper.) Some priors are *matching* priors, in the (restricted) sense that they give second-order accurate frequentist coverage. Most are not matching and none is third-order accurate, a level that may be attained by alternative approaches. As far as the abstract goes, this seems to be the crux of the paper. Which is fine, but does not qualify in my opinion as a criticism of the Bayesian paradigm, given that (a) it makes no claim at frequentist coverage and (b) I see no reason in proper coverage being connected with “truth” or “accuracy”. It truly makes no sense to me to attempt either to put a frequentist hat on posterior distributions or to check whether or not the posterior is “valid”, “true” or “actual”. I similarly consider that Efron‘s “genuine priors” do not belong to the Bayesian paradigm but are on the opposite anti-Bayesian in that they suggest all priors should stem from frequency modelling, to borrow the terms from the current paper. (This is also the position of the authors, who consider they have “no Bayes content”.)

Among their arguments, the authors refer to two tragic real cases: the earthquake at L’Aquila, where seismologists were charged (and then discharged) with manslaughter for asserting there was little risk of a major earthquake, and the indictment of the pharmaceutical company Merck for the deadly side-effects of their drug Vioxx. The paper however never return to those cases and fails to explain in which sense this is connected with the lack of reproducibility or of truth(fullness) of Bayesian procedures. If anything, the morale of the Aquila story is that statisticians should not draw definitive conclusions like there is no risk of a major earthquake or that it was improbable. There is a strange if human tendency for experts to reach definitive conclusions and to omit the many layers of uncertainty in their models and analyses. In the earthquake case, seismologists do not know how to predict major quakes from the previous activity and that should have been the [non-]conclusion of the experts. Which could possibly have been reached by a Bayesian modelling that always includes uncertainty. But the current paper is not at all operating at this (epistemic?) level, as it never ever questions the impact of the choice of a likelihood function or of a statistical model in the reproducibility framework. First, third or 47th order accuracy nonetheless operates strictly within the referential of the chosen model and providing the data to another group of scientists, experts or statisticians will invariably produce a different statistical modelling. So much for reproducibility or truth.