**A** definitely brilliant entry on xkcd that reflects upon the infinite regress of producing error evaluations that are based on estimates. A must for the next class when I introduce error bars and confidence intervals!

## Archive for confidence intervals

## look, look, confidence! [book review]

Posted in Books, Statistics, University life with tags ABC, amazon associates, Bayesian foundations, BibTeX, book review, confidence distribution, confidence intervals, epistemic probability, fiducial distribution, frequentist coverage, Neyman-Scott problem, Nobel Prize, Norway, prior free posterior, Quenouille, survey, whales on April 23, 2018 by xi'an**A**s it happens, I recently bought [with Amazon Associate earnings] a (used) copy of Confidence, Likelihood, Probability (Statistical Inference with Confidence Distributions), by Tore Schweder and Nils Hjort, to try to understand this confusing notion of confidence distributions. (And hence did not get the book from CUP or anyone else towards purposely writing a review. Or a ½-review like the one below.)

“Fisher squared the circle and obtained a posterior without a prior.” (p.419)

Now that I have gone through a few chapters, I am no less confused about the point of this notion. Which seems to rely on the availability of confidence intervals. Exact or asymptotic ones. The authors plainly recognise (p.61) that a confidence distribution is neither a posterior distribution nor a fiducial distribution, hence cutting off any possible Bayesian usage of the approach. Which seems right in that there is no coherence behind the construct, meaning for instance there is no joint distribution corresponding to the resulting marginals. Or even a specific dominating measure in the parameter space. (Always go looking for the dominating measure!) As usual with frequentist procedures, there is always a feeling of arbitrariness in the resolution, as for instance in the Neyman-Scott problem (p.112) where the profile likelihood and the deviance do not work, but considering directly the distribution of the (inconsistent) MLE of the variance “saves the day”, which sounds a bit like starting from the solution. Another statistical freak, the Fieller-Creasy problem (p.116) remains a freak in this context as it does not seem to allow for a confidence distribution. I also notice an ambivalence in the discourse of the authors of this book, namely that while they claim confidence distributions are both outside a probabilisation of the parameter and inside, “producing distributions for parameters of interest given the data (…) with fewer philosophical and interpretational obstacles” (p.428).

“Bias is particularly difficult to discuss for Bayesian methods, and seems not to be a worry for most Bayesian statisticians.” (p.10)

The discussions as to whether or not confidence distributions form a synthesis of Bayesianism and frequentism always fall short from being convincing, the choice of (or the dependence on) a prior distribution appearing to the authors as a failure of the former approach. Or unnecessarily complicated when there are nuisance parameters. Apparently missing on the (high) degree of subjectivity involved in creating the confidence procedures. Chapter 1 contains a section on “Why not go Bayesian?” that starts from Chris Sims‘ Nobel Lecture on the appeal of Bayesian methods and goes [softly] rampaging through each item. One point (3) is recurrent in many criticisms of B and I always wonder whether or not it is tongue-in-cheek-y… Namely the fact that parameters of a model are rarely if ever stochastic. This is a misrepresentation of the use of prior and posterior distributions [which are in fact] as summaries of information cum uncertainty. About a true fixed parameter. Refusing as does the book to endow posteriors with an epistemic meaning (except for “Bayesian of the Lindley breed” (p.419) is thus most curious. (The debate is repeating in the final(e) chapter as “why the world need not be Bayesian after all”.)

“To obtain frequentist unbiasedness, the Bayesian will have to choose her prior with unbiasedness in mind. Is she then a Bayesian?” (p.430)

A general puzzling feature of the book is that notions are not always immediately defined, but rather discussed and illustrated first. As for instance for the central notion of fiducial probability (Section 1.7, then Chapter 6), maybe because Fisher himself did not have a general principle to advance. The construction of a confidence distribution most often keeps a measure of mystery (and arbitrariness), outside the rather stylised setting of exponential families and sufficient (conditionally so) statistics. (Incidentally, our 2012 ABC survey is [kindly] quoted in relation with approximate sufficiency (p.180), while it does not sound particularly related to this part of the book. Now, is there an ABC version of confidence distributions? Or an ABC derivation?) This is not to imply that the book is uninteresting!, as I found reading it quite entertaining, with many humorous and tongue-in-cheek remarks, like “From Fraser (1961a) and until Fraser (2011), and hopefully even further” (p.92), and great datasets. (Including one entitled *Pornoscope*, which is about *drosophilia* mating.) And also datasets with lesser greatness, like the 3000 mink whales that were killed for Example 8.5, where the authors if not the whales “are saved by a large and informative dataset”… (Whaling is a recurrent [national?] theme throughout the book, along with sport statistics usually involving Norway!)

Miscellanea: The interest of the authors in the topic is credited to bowhead whales, more precisely to Adrian Raftery’s geometric merging (or melding) of two priors and to the resulting Borel paradox (xiii). Proposal that I remember Adrian presenting in Luminy, presumably in 1994. Or maybe in Aussois the year after. The book also repeats Don Fraser’s notion that the likelihood is a sufficient statistic, a point that still bothers me. (On the side, I realised while reading Confidence, &tc., that ABC cannot comply with the likelihood principle.) To end up on a French nitpicking note (!), Quenouille is typ(o)ed Quenoille in the main text, the references and the index. (Blame the .bib file!)

## X-Outline of a Theory of Statistical Estimation

Posted in Books, Statistics, University life with tags Bayesian Analysis, confidence intervals, credible intervals, Dennis Lindley, Harold Jeffreys, inference, Jerzy Neyman, maximum likelihood estimation, unbiasedness, University of Warwick, X-Outline on March 23, 2017 by xi'an**W**hile visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.

“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”

“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionallycontradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “

This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).

“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”

“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”

It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.

“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”

The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.

## exoplanets at 99.999…%

Posted in Books, pictures, Statistics, University life with tags astrostatistics, book reviews, confidence intervals, False positive, Monte Carlo technique, Significance on January 22, 2016 by xi'an**T**he latest Significance has a short article providing some coverage of the growing trend in the discovery of exoplanets, including new techniques used to detect those expoplanets from their impact on the associated stars. This [presumably] comes from the recent book *Cosmos: The Infographics Book of Space* *[a side comment: new books seem to provide material for many articles in Significance these days!]* and the above graph is also from the *book*, not the ultimate infographic representation in my opinion given that a simple superposition of lines could do as well. Or better.

¨A common approach to ruling out these sorts of false positives involves running sophisticated numerical algorithms, called Monte Carlo simulations, to explore a wide range of blend scenarios (…) A new planet discovery needs to have a confidence of (…) a one in a million chance that the result is in error.”

The above sentence is obviously of interest, first because the detection of false positives by Monte Carlo hints at a rough version of ABC to assess the likelihood of the observed phenomenon under the null [no detail provided] and second because the probability statement in the end is quite unclear as of its foundations… Reminding me of the Higgs boson controversy. The very last sentence of the article is however brilliant, albeit maybe unintentionaly so:

“To date, 1900 confirmed discoveries have been made. We have certainly come a long way from 1989.”

Yes, 89 down, strictly speaking!

## evolution of correlations [award paper]

Posted in Books, pictures, Statistics, University life with tags best paper award, bootstrap, confidence intervals, empirical correlation, Journal of Research in Personality, precision on September 15, 2015 by xi'an

“Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples.”

**O**n the statsblog aggregator, I spotted an entry that eventually led me to this post about the best paper award for the evolution of correlation, a paper published in the *Journal of Research in Personality*. A journal not particularly well-known for its statistical methodology input. The main message of the paper is that, while the empirical correlation is highly varying for small n’s, an interval (or *corridor of stability*!) can be constructed so that a Z-transform of the correlation does not vary away from the true value by more than a chosen quantity like 0.1. And the *point of stability* is then defined as the sample size after which the trajectory of the estimate does not leave the corridor… Both corridor and point depending on the true and unknown value of the correlation parameter by the by. Which implies resorting to bootstrap to assess the distribution of this point of stability. And deduce quantiles that can be used for… For what exactly?! Setting the necessary sample size? But this requires a preliminary run to assess the possible value of the true correlation ρ. The paper concludes that “for typical research scenarios reasonable trade-offs between accuracy and confidence start to be achieved when n approaches 250”. This figure was achieved by a bootstrap study on a bivariate Gaussian population with 10⁶ datapoints, yes indeed 10⁶!, and bootstrap samples of maximal size 10³. All in all, while I am at a loss as to why the *Journal of Research in Personality* would promote the estimation of a correlation coefficient with 250 datapoints, there is nothing fundamentally wrong with the paper (!), except for this recommendation of the 250 datapoint, as the figure stems from a specific setting with particular calibrations and cannot be expected to apply in every and all cases.

Actually, the graph in the paper was the thing that first attracted my attention because it looks very much like the bootstrap examples I show my third year students to demonstrate the appeal of bootstrap. Which is not particularly useful in the current case. A quick simulation on 100 samples of size 300 showed [above] that Monte Carlo simulations produce a tighter confidence band than the one created by bootstrap, in the Gaussian case. Continue reading

## Bureau international des poids et mesures [bayésiennes?]

Posted in pictures, Statistics, Travel with tags admissibility, Bayesian inference, Bureau international des poids et mesures, confidence intervals, conventions, France, frequentist inference, MaxEnt, norms, Paris, Pavillon de Breteuil, Sèvres, subjective versus objective Bayes, workshop on June 19, 2015 by xi'an**T**he workshop at the BIPM on measurement uncertainty was certainly most exciting, first by its location in the Parc de Saint Cloud in classical buildings overlooking the Seine river in a most bucolic manner…and second by its mostly Bayesian flavour. The recommendations that the workshop addressed are about revisions in the current GUM, which stands for the Guide to the Expression of Uncertainty in Measurement. The discussion centred on using a more Bayesian approach than in the earlier version, with the organisers of the workshop and leaders of the revision apparently most in favour of that move. “Knowledge-based pdfs” came into the discussion as an attractive notion since it rings a Bayesian bell, especially when associated with probability as a degree of belief and incorporating the notion of an a priori probability distribution. And propagation of errors. Or even more when mentioning the removal of frequentist validations. What I gathered from the talks is the perspective drifting away from central limit approximations to more realistic representations, calling for Monte Carlo computations. There is also a lot I did not get about conventions, codes and standards. Including a short debate about the different meanings on Monte Carlo, from simulation technique to calculation method (as for confidence intervals). And another discussion about replacing the old formula for estimating sd from the Normal to the Student’s ** t **case. A change that remains highly debatable since the Student’s

**assumption is as shaky as the Normal one. What became clear [to me] during the meeting is that a rather heated debate is currently taking place about the need for a revision, with some members of the six (?) organisations involved arguing against Bayesian or linearisation tools.**

*t*This became even clearer during our frequentist versus Bayesian session with a first talk so outrageously anti-Bayesian it was hilarious! Among other things, the notion that “fixing” the data was against the principles of physics (the speaker was a physicist), that the only randomness in a Bayesian coin tossing was coming from the prior, that the likelihood function was a subjective construct, that the definition of the posterior density was a generalisation of Bayes’ theorem [generalisation found in… Bayes’ 1763 paper then!], that objective Bayes methods were inconsistent [because Jeffreys’ prior produces an inadmissible estimator of μ²!], that the move to Bayesian principles in GUM would cost the New Zealand economy 5 billion dollars [hopefully a frequentist estimate!], &tc., &tc. The second pro-frequentist speaker was by comparison much much more reasonable, although he insisted on showing Bayesian credible intervals do not achieve a nominal frequentist coverage, using a sort of fiducial argument distinguishing x=X+ε from X=x+ε that I missed… A lack of achievement that is fine by my standards. Indeed, a frequentist confidence interval provides a coverage guarantee either for a fixed parameter (in which case the Bayesian approach achieves better coverage by constant updating) or a varying parameter (in which case the frequency of proper inclusion is of no real interest!). The first Bayesian speaker was Tony O’Hagan, who summarily shred the first talk to shreds. And also criticised GUM2 for using reference priors and maxent priors. I am afraid my talk was a bit too exploratory for the audience (since I got absolutely no question!) In retrospect, I should have given an into to reference priors.

An interesting specificity of a workshop on metrology and measurement is that they are hard stickers to schedule, starting and finishing right on time. When a talk finished early, we waited until the intended time to the next talk. Not even allowing for extra discussion. When the only overtime and Belgian speaker ran close to 10 minutes late, I was afraid he would (deservedly) get lynched! He escaped unscathed, but may (and should) not get invited again..!