Archive for Ronald Fisher

Deirdre McCloskey dans Le Monde

Posted in Statistics with tags , , , , , , , , on January 13, 2020 by xi'an

fiducial inference

Posted in Books, Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , on October 30, 2017 by xi'an

In connection with my recent tale of the many ε’s, I received from Gunnar Taraldsen [from Tronheim, Norge] a paper [jointly written with Bo Lindqvist and just appeared on-line in JSPI] on conditional fiducial models.

“The role of the prior and the statistical model in Bayesian analysis is replaced by the use of the fiducial model x=R(θ,ε) in fiducial inference. The fiducial is obtained in this case without a prior distribution for the parameter.”

Reading this paper after addressing the X validated question made me understood better the fundamental wrongness of fiducial analysis! If I may herein object to Fisher himself… Indeed, when writing x=R(θ,ε), as the representation of the [observed] random variable x as a deterministic transform of a parameter θ and of an [unobserved] random factor ε, the two random variables x and ε are based on the same random preimage ω, i.e., x=x(ω) and ε=ε(ω). Observing x hence sets a massive constraint on the preimage ω and on the conditional distribution of ε=ε(ω). When the fiducial inference incorporates another level of randomness via an independent random variable ε’ and inverts x=R(θ,ε’) into θ=θ(x,ε’), assuming there is only one solution to the inversion, it modifies the nature of the underlying σ-algebra into something that is incompatible with the original model. Because of this sudden duplication of the random variates. While the inversion of this equation x=R(θ,ε’) gives an idea of the possible values of θ when ε varies according to its [prior] distribution, it does not account for the connection between x and ε. And does not turn the original parameter into a random variable with an implicit prior distribution.

As to conditional fiducial distributions, they are defined by inversion of x=R(θ,ε), under a certain constraint on θ, like C(θ)=0, which immediately raises a Pavlovian reaction in me, namely that since the curve C(θ)=0 has measure zero under the original fiducial distribution, how can this conditional solution be uniquely or at all defined. Or to avoid the Borel paradox mentioned in the paper. If I get the meaning of the authors in this section, the resulting fiducial distribution will actually depend on the choice of σ-algebra governing the projection.

“A further advantage of the fiducial approach in the case of a simple fiducial model is that independent samples are produced directly from independent sampling from [the fiducial distribution]. Bayesian simulations most often come as dependent samples from a Markov chain.”

This side argument in “favour” of the fiducial approach is most curious as it brings into the picture computational aspects that do not have any reason to be there. (The core of the paper is concerned with the unicity of the fiducial distribution in some univariate settings. Not with computational issues.)

Barker at the Bernoulli factory

Posted in Books, Statistics with tags , , , , , , , on October 5, 2017 by xi'an

Yesterday, Flavio Gonçalves, Krzysztof Latuszýnski, and Gareth Roberts (Warwick) arXived a paper on Barker’s algorithm for Bayesian inference with intractable likelihoods.

“…roughly speaking Barker’s method is at worst half as good as Metropolis-Hastings.”

Barker’s acceptance probability (1965) is a smooth if less efficient version of Metropolis-Hastings. (Barker wrote his thesis in Adelaide, in the Mathematical Physics department. Most likely, he never interacted with Ronald Fisher, who died there in 1962) This smoothness is exploited by devising a Bernoulli factory consisting in a 2-coin algorithm that manages to simulate the Bernoulli variable associated with the Barker probability, from a coin that can simulate Bernoulli’s with probabilities proportional to [bounded] π(θ). For instance, using a bounded unbiased estimator of the target. And another coin that simulates another Bernoulli on a remainder term. Assuming the bound on the estimate of π(θ) is known [or part of the remainder term]. This is a neat result in that it expands the range of pseudo-marginal methods (and resuscitates Barker’s formula from oblivion!). The paper includes an illustration in the case of the far-from-toyish Wright-Fisher diffusion. [Making Fisher and Barker meeting, in the end!]

inferential models: reasoning with uncertainty [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 6, 2016 by xi'an

“the field of statistics (…) is still surprisingly underdeveloped (…) the subject lacks a solid theory for reasoning with uncertainty [and] there has been very little progress on the foundations of statistical inference” (p.xvi)

A book that starts with such massive assertions is certainly hoping to attract some degree of attention from the field and likely to induce strong reactions to this dismissal of the not inconsiderable amount of research dedicated so far to statistical inference and in particular to its foundations. Or even attarcting flak for not accounting (in this introduction) for the past work of major statisticians, like Fisher, Kiefer, Lindley, Cox, Berger, Efron, Fraser and many many others…. Judging from the references and the tone of this 254 pages book, it seems like the two authors, Ryan Martin and Chuanhai Liu, truly aim at single-handedly resetting the foundations of statistics to their own tune, which sounds like a new kind of fiducial inference augmented with calibrated belief functions. Be warned that five chapters of this book are built on as many papers written by the authors in the past three years. Which makes me question, if I may, the relevance of publishing a book on a brand-new approach to statistics without further backup from a wider community.

“…it is possible to calibrate our belief probabilities for a common interpretation by intelligent minds.” (p.14)

Chapter 1 contains a description of the new perspective in Section 1.4.2, which I find useful to detail here. When given an observation x from a Normal N(θ,1) model, the authors rewrite X as θ+Z, with Z~N(0,1), as in fiducial inference, and then want to find a “meaningful prediction of Z independently of X”. This seems difficult to accept given that, once X=x is observed, Z=X-θ⁰, θ⁰ being the true value of θ, which belies the independence assumption. The next step is to replace Z~N(0,1) by a random set S(Z) containing Z and to define a belief function bel() on the parameter space Θ by

bel(A|X) = P(X-S(Z)⊆A)

which induces a pseudo-measure on Θ derived from the distribution of an independent Z, since X is already observed. When Z~N(0,1), this distribution does not depend on θ⁰ the true value of θ… The next step is to choose the belief function towards a proper frequentist coverage, in the approximate sense that the probability that bel(A|X) be more than 1-α is less than α when the [arbitrary] parameter θ is not in A. And conversely. This property (satisfied when bel(A|X) is uniform) is called validity or exact inference by the authors: in my opinion, restricted frequentist calibration would certainly sound more adequate.

“When there is no prior information available, [the philosophical justifications for Bayesian analysis] are less than fully convincing.” (p.30)

“Is it logical that an improper “ignorance” prior turns into a proper “non-ignorance” prior when combined with some incomplete information on the whereabouts of θ?” (p.44)

Continue reading

Revised evidence for statistical standards

Posted in Kids, Statistics, University life with tags , , , , , , , , on December 19, 2013 by xi'an

valizWe just submitted a letter to PNAS with Andrew Gelman last week, in reaction to Val Johnson’s recent paper “Revised standards for statistical evidence”, essentially summing up our earlier comments within 500 words. Actually, we wrote one draft each! In particular, Andrew came up with the (neat) rhetorical idea of alternative Ronald Fishers living in parallel universes who had each set a different significance reference level and for whom alternative Val Johnsons would rise and propose a modification of the corresponding Fisher’s level. For which I made the above graph, left out of the letter and its 500 words. It relates “the old z” and “the new z”, meaning the boundaries of the rejection zones when, for each golden dot, the “old z” is the previous “new z” and “the new z” is Johnson’s transform. We even figured out that Val’s transform was bringing the significance down by a factor of 10 in a large range of values. As an aside, we also wondered why most of the supplementary material was spent on deriving UMPBTs for specific (formal) problems when the goal of the paper sounded much more global…

As I am aware we are not the only ones to have submitted a letter about Johnson’s proposal, I am quite curious at the reception we will get from the editor! (Although I have to point out that all of my earlier submissions of letters to to PNAS got accepted.)

10 Little’s simple ideas

Posted in Books, Statistics, University life with tags , , , , , , , , on July 17, 2013 by xi'an

“I still feel that too much of academic statistics values complex mathematics over elegant simplicity — it is necessary for a research paper to be complicated in order to be published.” Roderick Little, JASA, p.359

Roderick Little wrote his Fisher lecture, recently published in JASA, around ten simple ideas for statistics. Its title is “In praise of simplicity not mathematistry! Ten simple powerful ideas for the statistical scientist”. While this title is rather antagonistic, blaming mathematical statistics for the rise of mathematistry in the field (a term borrowed from Fisher, who also invented the adjective ‘Bayesian’), the paper focus on those 10 ideas and very little on why there is (would be) too much mathematics in statistics:

  1. Make outcomes univariate
  2. Bayes rule, for inference under an assumed model
  3. Calibrated Bayes, to keep inference honest
  4. Embrace well-designed simulation experiments
  5. Distinguish the model/estimand, the principles of estimation, and computational methods
  6. Parsimony — seek a good simple model, not the “right” model
  7. Model the Inclusion/Assignment and try to make it ignorable
  8. Consider dropping parts of the likelihood to reduce the modeling part
  9. Potential outcomes and principal stratification for causal inferenc
  10. Statistics is basically a missing data problem

“The mathematics of problems with infinite parameters is interesting, but with finite sample sizes, I would rather have a parametric model. “Mathematistry” may eschew parametric models because the asymptotic theory is too simple, but they often work well in practice.” Roderick Little, JASA, p.365

Both those rules and the illustrations that abund in the paper are reflecting upon Little’s research focus and obviously apply to his model in a fairly coherent way. However, while a mostly parametric model user myself, I fear the rejection of non-parametric techniques is far too radical. It is more and more my convinction that we cannot handle the full complexity of a realistic structure in a standard Bayesian manner and that we have to give up on the coherence and completeness goals at some point… Using non-parametrics and/or machine learning on some bits and pieces then makes sense, even though it hurts elegance and simplicity.

“However, fully Bayes inference requires detailed probability modeling, which is often a daunting task. It seems worth sacrifycing some Bayesian inferential purity if the task can be simplified.” Roderick Little, JASA, p.366

I will not discuss those ideas in detail, as some of them make complete sense to me (like Bayesian statistics laying its assumptions in the open) and others remain obscure (e.g., causality) or with limited applicability. It is overall a commendable Fisher lecture that focus on methodology and the practice of statistical science, rather than on theory. I however do not see the reason why maths should be blamed for this state of the field. Nor why mathematical statistics journals like AoS would carry some responsibility in the lack of further applicability in other fields.  Students of statistics do need a strong background in mathematics and I fear we are losing ground in this respect, at least judging by the growing difficulty in finding measure theory courses abroad for our exchange undergradutes from Paris-Dauphine. (I also find the model misspecification aspects mostly missing from this list.)

the anti-Bayesian moment and its passing

Posted in Books, Statistics, University life with tags , , , , , , , , on October 30, 2012 by xi'an

Today, our reply to the discussion of our American Statistician paper “Not only defended but also applied” by Stephen Fienberg, Wes Johnson, Deborah Mayo, and Stephen Stiegler,, was posted on arXiv. It is kind of funny that this happens the day I am visiting Iowa State University Statistics Department, a department that was formerly a Fisherian and thus anti-Bayesian stronghold. (Not any longer, to be sure! I was also surprised to discover that before the creation of the department, Henry Wallace, came to lecture on machine calculations for statistical methods…in 1924!)

The reply to the discussion was rewritten and much broadened by Andrew after I drafted a more classical point-by-point reply to our four discussants, much to its improvement. For one thing, it reads well on its own, as the discussions are not yet available on-line. For another, it gives a broader impact of the discussion, which suits well the readership of The American Statistician. (Some of my draft reply is recycled in this post.)

Continue reading