## confidence in confidence

Posted in Statistics, University life with tags , , , , on June 8, 2022 by xi'an

[This is a ghost post that I wrote eons ago and which got lost in the meanwhile.]

Following the false confidence paper, Céline Cunen, Niels Hjort & Tore Schweder wrote a short paper in the same Proceedings A defending confidence distributions. And blame the phenomenon on Bayesian tools, which “might have unfortunate frequentist properties”. Which comes as no surprise since Tore Schweder and Nils Hjort wrote a book promoting confidence distributions for statistical inference.

“…there will never be any false confidence, and we can trust the obtained confidence! “

Their re-analysis of Balch et al (2019) is that using a flat prior on the location (of a satellite) leads to a non-central chi-square distribution as the posterior on the squared distance δ² (between two satellites). Which incidentally happens to be a case pointed out by Jeffreys (1939) against the use of the flat prior as δ² has a constant bias of d (the dimension of the space) plus the non-centrality parameter. And offers a neat contrast between the posterior, with non-central chi-squared cdf with two degrees of freedom

$F(\delta)=\Gamma_2(\delta^2/\sigma^2;||y||^2/\sigma^2)$

and the confidence “cumulative distribution”

$C(\delta)=1-\Gamma_2(|y||^2/\sigma^2;\delta^2/\sigma^2)$

Cunen et al (2020) argue that the frequentist properties of the confidence distribution 1-C(R), where R is the impact distance, are robust to an increasing σ when the true value is also R. Which does not seem to demonstrate much. A second illustration of B and C when the distance δ varies and both σ and |y|² are fixed is even more puzzling when the authors criticize the Bayesian credible interval for missing the “true” value of δ, as I find the statement meaningless for a fixed value of |y|²… Looking forward the third round!, i.e. a rebuttal by Balch et al (2019)

## simplified Bayesian analysis

Posted in Statistics with tags , , , , , , , , , , , , on February 10, 2021 by xi'an

A colleague from Dauphine sent me a paper by Carlo Graziani on a Bayesian analysis of vaccine efficiency, asking for my opinion. The Bayesian side is quite simple: given two Poisson observations, N~P(μ) and M~P(ν), there exists a reparameterisation of (μ,ν) into

e=1-μ/rν  and  λ=ν(1+(1-e)r)=μ+ν

vaccine efficiency and expectation of N+M, respectively, when r is the vaccine-to-placebo ratio of person-times at risk, ie the ratio of the numbers of participants in each group. Reparameterisation such that the likelihood factorises into a function of e and a function of λ. Using a product prior for this parameterisation leads to a posterior on e times a posterior on λ. This is a nice remark, which may have been made earlier (as for instance another approach to infer about e while treating λ as a nuisance parameter is to condition on N+M). The paper then proposes as an application of this remark an analysis of the results of three SARS-Cov-2 vaccines, meaning using the pairs (N,M) for each vaccine and deriving credible intervals, which sounds more like an exercise in basic Bayesian inference than a fundamental step in assessing the efficiency of the vaccines…

## [Nature on] simulations driving the world’s response to COVID-19

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on April 30, 2020 by xi'an

Nature of 02 April 2020 has a special section on simulation methods used to assess and predict the pandemic evolution. Calling for caution as the models used therein, like the standard ODE S(E)IR models, which rely on assumptions on the spread of the data and very rarely on data, especially in the early stages of the pandemic. One epidemiologist is quote stating “We’re building simplified representations of reality” but this is not dire enough, as “simplified” evokes “less precise” rather than “possibly grossly misleading”. (The graph above is unrelated to the Nature cover and appears to me as particularly appalling in mixing different types of data, time-scale, population at risk, discontinuous updates, and essentially returning no information whatsoever.)

“[the model] requires information that can be only loosely estimated at the start of an epidemic, such as the proportion of infected people who die, and the basic reproduction number (…) rough estimates by epidemiologists who tried to piece together the virus’s basic properties from incomplete information in different countries during the pandemic’s early stages. Some parameters, meanwhile, must be entirely assumed.”

The report mentions that the team at Imperial College, which predictions impacted the UK Government decisions, also used an agent-based model, with more variability or stochasticity in individual actions, which require even more assumptions or much more refined, representative, and trustworthy data.

“Unfortunately, during a pandemic it is hard to get data — such as on infection rates — against which to judge a model’s projections.”

Unfortunately, the paper was written in the early days of the rise of cases in the UK, which means predictions were not much opposed to actual numbers of deaths and hospitalisations. The following quote shows how far off they can fall from reality:

“the British response, Ferguson said on 25 March, makes him “reasonably confident” that total deaths in the United Kingdom will be held below 20,000.”

since the total number as of April 29 is above 21,000 24,000 29,750 and showing no sign of quickly slowing down… A quite useful general public article, nonetheless.

## calibrating approximate credible sets

Posted in Books, Statistics with tags , , , , , , , on October 26, 2018 by xi'an

Earlier this week, Jeong Eun Lee, Geoff Nicholls, and Robin Ryder arXived a paper on the calibration of approximate Bayesian credible intervals. (Warning: all three authors are good friends of mine!) They start from the core observation that dates back to Monahan and Boos (1992) of exchangeability between θ being generated from the prior and φ being generated from the posterior associated with one observation generated from the prior predictive. (There is no name for this distribution, other than the prior, that is!) A setting amenable to ABC considerations! Actually, Prangle et al. (2014) relies on this property for assessing the ABC error, while pointing out that the test for exchangeability is not fool-proof since it works equally for two generations from the prior.

“The diagnostic tools we have described cannot be “fooled” in quite the same way checks based on the exchangeability can be.”

The paper thus proposes methods for computing the coverage [under the true posterior] of a credible set computed using an approximate posterior. (I had to fire up a few neurons to realise this was the right perspective, rather than the reverse!) A first solution to approximate the exact coverage of the approximate credible set is to use logistic regression, instead of the exact coverage, based on some summary statistics [not necessarily in an ABC framework]. And a simulation outcome that the parameter [simulated from the prior] at the source of the simulated data is within the credible set. Another approach is to use importance sampling when simulating from the pseudo-posterior. However this sounds dangerously close to resorting to an harmonic mean estimate, since the importance weight is the inverse of the approximate likelihood function. Not that anything unseemly transpires from the simulations.

## X-Outline of a Theory of Statistical Estimation

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 23, 2017 by xi'an

While visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.

“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”

“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionally contradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “

This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).

“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”

“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”

It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.

“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”

The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.