## double yolk priors [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , on March 14, 2018 by xi'an

[Here is an email I received from Subhadeep Mukhopadhyay, one of the authors of the paper I discussed yesterday.}
Thank for discussing our work. Let me clarify the technical point that you raised:
– The difference between Legj(u)_j and Tj=Legj(G(θ)). One is orthonormal polyn of L2[0,1] and the other one is L2[G]. The second one is poly of rank-transform G(θ).
– As you correctly pointed out there is a danger in directly approximating the ratio. We work on it after taking the quantile transform: evaluate the ratio at g⁻¹(θ), which is the d(u;G,F) over unit interval. Now, this new transformed function is a proper density.
-Thus the ratio now becomes d(G(θ)) which can be expended into (NOT in Leg-basis) in $T_j$, in eq (2.2), as it lives in the Hilbert space L2(G)
– For your last point on Step 2 of our algo, we can also use the simple integrate command.
-Unlike traditional prior-data conflict here we attempted to answer three questions in one-shot: (i) How compatible is the pre-selected g with the given data? (ii) In the event of a conflict, can we also inform the user on the nature of misfit–finer structure that was a priori unanticipated? (iii) Finally, we would like to provide a simple, yet formal guideline for upgrading (repairing) the starting g.
Hopefully, this will clear the air. But thanks for reading the paper so carefully. Appreciate it.

## double yolk priors

Posted in Statistics with tags , , , , on March 13, 2018 by xi'an

“To develop a “defendable and defensible” Bayesian learning model, we have to go beyond blindly ‘turning the crank’ based on a “go-as-you-like” [approximate guess] prior. A lackluster attitude towards prior modeling could lead to disastrous inference, impacting various fields from clinical drug development to presidential election forecasts. The real questions are: How can we uncover the blind spots of the conventional wisdom-based prior? How can we develop the science of prior model-building that combines both data and science [DS-prior] in a testable manner – a double-yolk Bayesian egg?”

I came through R bloggers on this presentation of a paper by Subhadeep Mukhopadhyay and Douglas Fletcher, Bayesian modelling via goodness of fit, that aims at solving all existing problems with classical Bayesian solutions, apparently! (With also apparently no awareness of David Spiegelhalter’s take on the matter.) As illustrated by both quotes, above and below:

“The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a Bayes-frequentist consolidated data analysis work ow that is more effective than either of the two separately.”

(I wonder who else in this Universe would characterise “modern Bayesian statistics” in such a non-Bayesian way! And love the notion of distillation applied to priors!) The setup is actually one of empirical Bayes inference where repeated values of the parameter θ drawn from the prior are behind independent observations. Which is not the usual framework for a statistical analysis, where a single value of the parameter is supposed to hide behind the data, but most convenient for frequency based arguments behind empirical Bayes methods (which is the case here). The paper adopts a far-from-modern discourse on the “truth” of “the” prior… (Which is always conjugate in that Universe!) Instead of recognising the relativity of a statistical analysis based on a given prior.

When I tried to read the paper any further, I hit a wall as I could not understand the principle described therein. And how it “consolidates Bayes and frequentist, parametric and nonparametric, subjective and objective, quantile and information-theoretic philosophies.”. Presumably the lack of oxygen at the altitude of Chamonix…. Given an “initial guess” at the prior, g, a conjugate prior (in dimension one with an invertible cdf), a family of priors is created in what first looks like a form of non-parametric exponential tilting of g. But a closer look [at (2.1)] exposes the “family” as the tautological π(θ)=g(θ)x π(θ)/g(θ). The ratio is expanded into a Legendre polynomial series. Which use in Bayesian statistics dates a wee bit further back than indicated in the paper (see, e.g., Friedman, 1985; Diaconis, 1986). With the side issue that the resulting approximation does not integrate to one. Another side issue is that the coefficients of the Legendre truncated series are approximated by simulations from the prior [Step 3 of the Type II algorithm], rarely an efficient approach to the posterior.

## distributions for parameters [seminar]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 22, 2018 by xi'an
Next Thursday, January 25, Nancy Reid will give a seminar in Paris-Dauphine on distributions for parameters that covers different statistical paradigms and bring a new light on the foundations of statistics. (Coffee is at 10am in the Maths department common room and the talk is at 10:15 in room A, second floor.)

Nancy Reid is University Professor of Statistical Sciences and the Canada Research Chair in Statistical Theory and Applications at the University of Toronto and internationally acclaimed statistician, as well as a 2014 Fellow of the Royal Society of Canada. In 2015, she received the Order of Canada, was elected a foreign associate of the National Academy of Sciences in 2016 and has been awarded many other prestigious statistical and science honours, including the Committee of Presidents of Statistical Societies (COPSS) Award in 1992.

Nancy Reid’s research focuses on finding more accurate and efficient methods to deduce and conclude facts from complex data sets to ultimately help scientists find specific solutions to specific problems.

There is currently some renewed interest in developing distributions for parameters, often without relying on prior probability measures. Several approaches have been proposed and discussed in the literature and in a series of “Bayes, fiducial, and frequentist” workshops and meeting sessions. Confidence distributions, generalized fiducial inference, inferential models, belief functions, are some of the terms associated with these approaches.  I will survey some of this work, with particular emphasis on common elements and calibration properties. I will try to situate the discussion in the context of the current explosion of interest in big data and data science.

## en route to Boston!

Posted in pictures, Running, Travel, University life with tags , , , , , , , on April 29, 2017 by xi'an

## beyond objectivity, subjectivity, and other ‘bjectivities

Posted in Statistics with tags , , , , , , , , , , , , , on April 12, 2017 by xi'an

Here is my discussion of Gelman and Hennig at the Royal Statistical Society, which I am about to deliver!

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on April 5, 2017 by xi'an

Andrew Gelman and Christian Hennig will give a Read Paper presentation next Wednesday, April 12, 5pm, at the Royal Statistical Society, London, on their paper “Beyond subjective and objective in statistics“. Which I hope to attend and else to write a discussion. Since the discussion (to published in Series A) is open to everyone, I strongly encourage ‘Og’s readers to take a look at the paper and the “radical” views therein to hopefully contribute to this discussion. Either as a written discussion or as comments on this very post.

## Validity and the foundations of statistical inference

Posted in Statistics with tags , , , , , , , , on July 29, 2016 by xi'an

Natesh pointed out to me this recent arXival with a somewhat grandiose abstract:

In this paper, we argue that the primary goal of the foundations of statistics is to provide data analysts with a set of guiding principles that are guaranteed to lead to valid statistical inference. This leads to two new questions: “what is valid statistical inference?” and “do existing methods achieve this?” Towards answering these questions, this paper makes three contributions. First, we express statistical inference as a process of converting observations into degrees of belief, and we give a clear mathematical definition of what it means for statistical inference to be valid. Second, we evaluate existing approaches Bayesian and frequentist approaches relative to this definition and conclude that, in general, these fail to provide valid statistical inference. This motivates a new way of thinking, and our third contribution is a demonstration that the inferential model framework meets the proposed criteria for valid and prior-free statistical inference, thereby solving perhaps the most important unsolved problem in statistics.

Since solving the “most important unsolved problem in statistics” sounds worth pursuing, I went and checked the paper‘s contents.

“To us, the primary goal of the foundations of statistics is to provide a set of guiding principles that, if followed, will guarantee validity of the resulting inference. Our motivation for writing this paper is to be clear about what is meant by valid inference and to provide the necessary principles to help data analysts achieve validity.”

Which can be interpreted in so many ways that it is somewhat meaningless…

“…if real subjective prior information is available, we recommend using it. However, there is an expanding collection of work (e.g., machine learning, etc) that takes the perspective that no real prior information is available. Even a large part of the literature claiming to be Bayesian has abandoned the interpretation of the prior as a serious part of the model, opting for “default” prior that “works.” Our choice to omit a prior from the model is not for the (misleading) purpose of being “objective”—subjectivity is necessary—but, rather, for the purpose of exploring what can be done in cases where a fully satisfactory prior is not available, to see what improvements can be made over the status quo.”

This is a pretty traditional criticism of the Bayesian approach, namely that if a “true” prior is provided (by whom?) then it is optimal to use it. But this amounts to turn the prior into another piece of the sampling distribution and is not in my opinion a Bayesian argument! Most of the criticisms in the paper are directed at objective Bayes approaches, with the surprising conclusion that, because there exist cases where no matching prior is available, “the objective Bayesian approach [cannot] be considered as a general framework for scientific inference.” (p.9)

Another section argues that a Bayesian modelling cannot describe a state of total ignorance. This is formally correct, which is why there is no such thing as a non-informative or the non-informative prior, as often discussed here, but is this truly relevant, in that the inference problem contains one way or another information about the parameter, for instance through a loss function or a pseudo-likelihood.

“This is a desirable property that most existing methods lack.”

The proposal central to the paper thesis is to replace posterior probabilities by belief functions b(.|X), called statistical inference, that are interpreted as measures of evidence about subsets A of the parameter space. If not necessarily as probabilities. This is not very novel, witness the works of Dempster, Shafer and subsequent researchers. And not very much used outside Bayesian and fiducial statistics because of the mostly impossible task of defining a function over all subsets of the parameter space. Because of the subjectivity of such “beliefs”, they will be “valid” only if they are well-calibrated in the sense of b(A|X) being sub-uniform, that is, more concentrated near zero than a uniform variate (i.e., small) under the alternative, i.e. when θ is not in A. At this stage, since this is a mix of a minimax and proper coverage condition, my interest started to quickly wane… Especially because the sub-uniformity condition is highly demanding, if leading to controls over the Type I error and the frequentist coverage. As often, I wonder at the meaning of a calibration property obtained over all realisations of the random variable and all values of the parameter. So for me stability is neither “desirable” nor “essential”. Overall, I have increasing difficulties in perceiving proper coverage as a relevant property. Which has no stronger or weaker meaning that the coverage derived from a Bayesian construction.

“…frequentism does not provide any guidance for selecting a particular rule or procedure.”

I agree with this assessment, which means that there is no such thing as frequentist inference, but rather a philosophy for assessing procedures. That the Gleser-Hwang paradox invalidates this philosophy sounds a bit excessive, however. Especially when the bounded nature of Bayesian credible intervals is also analysed as a failure. A more relevant criticism is the lack of directives for picking procedures.

“…we are the first to recognize that the belief function’s properties are necessary in order for the inferential output to satisfy the required validity property”

The construction of the “inferential model” proposed by the authors offers similarities withn fiducial inference, in that it builds upon the representation of the observable X as X=a(θ,U). With further constraints on the function a() to ensure the validity condition holds… An interesting point is that the functional connection X=a(θ,U) means that the nature of U changes once X is observed, albeit in a delicate manner outside a Bayesian framework. When illustrated on the Gleser-Hwang paradox, the resolution proceeds from an arbitrary choice of a one-dimensional summary, though. (As I am reading the paper, I realise it builds on other and earlier papers by the authors, papers that I cannot read for lack of time. I must have listned to a talk by one of the authors last year at JSM as this rings a bell. Somewhat.) In conclusion of a quick Sunday afternoon read, I am not convinced by the arguments in the paper and even less by the impression of a remaining arbitrariness in setting the resulting procedure.