Archive for fiducial distribution

minibatch acceptance for Metropolis-Hastings

Posted in Books, Statistics with tags , , , , , on January 12, 2018 by xi'an

An arXival that appeared last July by Seita, Pan, Chen, and Canny, and that relates to my current interest in speeding up MCMC. And to 2014 papers by  Korattikara et al., and Bardenet et al. Published in Uncertainty in AI by now. The authors claim that their method requires less data per iteration than earlier ones…

“Our test is applicable when the variance (over data samples) of the log probability ratio between the proposal and the current state is less than one.”

By test, the authors mean a mini-batch formulation of the Metropolis-Hastings acceptance ratio in the (special) setting of iid data. First they use Barker’s version of the acceptance probability instead of Metropolis’. Second, they use a Gaussian approximation to the distribution of the logarithm of the Metropolis ratio for the minibatch, while the Barker acceptance step corresponds to comparing a logistic perturbation of the logarithm of the Metropolis ratio against zero. Which amounts to compare the logarithm of the Metropolis ratio for the minibatch, perturbed by a logistic minus Normal variate. (The cancellation of the Normal in eqn (13) is a form of fiducial fallacy, where the Normal variate has two different meanings. In other words, the difference of two Normal variates is not equal to zero.) However, the next step escapes me as the authors seek to optimise the distribution of this logistic minus Normal variate. Which I thought was uniquely defined as such a difference. Another constraint is that the estimated variance of the log-likelihood ratio gets below one. (Why one?) The argument is that the average of the individual log-likelihoods is approximately Normal by virtue of the Central Limit Theorem. Even when randomised. While the illustrations on a Gaussian mixture and on a logistic regression demonstrate huge gains in computational time, it is unclear to me to which amount one can trust the approximation for a given model and sample size…

on confidence distributions

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , on January 10, 2018 by xi'an

As Regina Liu gave her talk at ISI this morning on fusion learning and confidence distributions, this led me to think anew about this strange notion of confidence distributions, building a distribution on the parameter space without a prior to go with it, implicitly or explicitly, and vaguely differing from fiducial inference. (As an aside, the Wikipedia page on confidence distributions is rather heavily supporting the concept and was primarily written by someone from Rutgers, where the modern version was developed. [And as an aside inside the aside, Schweder and Hjort’s book is sitting in my office, waiting for me!])

Recall that a confidence distribution is a sample dependent distribution on the parameter space, which is uniform U(0,1) [in the sample] at the “true” value of the parameter. Used thereafter as a posterior distribution. (Again, almost always without a prior to go with it. Which is an incoherence from a probabilistic perspective. not mentioning the issue of operating without a pre-defined dominating measure. This measure issue is truly bothering me!) This seems to include fiducial distributions based on a pivot, unless I am confused. As noted in the review by Nadarajah et al. Moreover, the concept of creating a pseudo-posterior out of an existing (frequentist) confidence interval procedure to create a new (frequentist) procedure does not carry an additional validation per se, as it clearly depends on the choice of the initialising procedure. (Not even mentioning the lack of invariance and the intricacy of multidimensional extensions.)

fiducial inference

Posted in Books, Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , on October 30, 2017 by xi'an

In connection with my recent tale of the many ε’s, I received from Gunnar Taraldsen [from Tronheim, Norge] a paper [jointly written with Bo Lindqvist and just appeared on-line in JSPI] on conditional fiducial models.

“The role of the prior and the statistical model in Bayesian analysis is replaced by the use of the fiducial model x=R(θ,ε) in fiducial inference. The fiducial is obtained in this case without a prior distribution for the parameter.”

Reading this paper after addressing the X validated question made me understood better the fundamental wrongness of fiducial analysis! If I may herein object to Fisher himself… Indeed, when writing x=R(θ,ε), as the representation of the [observed] random variable x as a deterministic transform of a parameter θ and of an [unobserved] random factor ε, the two random variables x and ε are based on the same random preimage ω, i.e., x=x(ω) and ε=ε(ω). Observing x hence sets a massive constraint on the preimage ω and on the conditional distribution of ε=ε(ω). When the fiducial inference incorporates another level of randomness via an independent random variable ε’ and inverts x=R(θ,ε’) into θ=θ(x,ε’), assuming there is only one solution to the inversion, it modifies the nature of the underlying σ-algebra into something that is incompatible with the original model. Because of this sudden duplication of the random variates. While the inversion of this equation x=R(θ,ε’) gives an idea of the possible values of θ when ε varies according to its [prior] distribution, it does not account for the connection between x and ε. And does not turn the original parameter into a random variable with an implicit prior distribution.

As to conditional fiducial distributions, they are defined by inversion of x=R(θ,ε), under a certain constraint on θ, like C(θ)=0, which immediately raises a Pavlovian reaction in me, namely that since the curve C(θ)=0 has measure zero under the original fiducial distribution, how can this conditional solution be uniquely or at all defined. Or to avoid the Borel paradox mentioned in the paper. If I get the meaning of the authors in this section, the resulting fiducial distribution will actually depend on the choice of σ-algebra governing the projection.

“A further advantage of the fiducial approach in the case of a simple fiducial model is that independent samples are produced directly from independent sampling from [the fiducial distribution]. Bayesian simulations most often come as dependent samples from a Markov chain.”

This side argument in “favour” of the fiducial approach is most curious as it brings into the picture computational aspects that do not have any reason to be there. (The core of the paper is concerned with the unicity of the fiducial distribution in some univariate settings. Not with computational issues.)

Validity and the foundations of statistical inference

Posted in Statistics with tags , , , , , , , , on July 29, 2016 by xi'an

Natesh pointed out to me this recent arXival with a somewhat grandiose abstract:

In this paper, we argue that the primary goal of the foundations of statistics is to provide data analysts with a set of guiding principles that are guaranteed to lead to valid statistical inference. This leads to two new questions: “what is valid statistical inference?” and “do existing methods achieve this?” Towards answering these questions, this paper makes three contributions. First, we express statistical inference as a process of converting observations into degrees of belief, and we give a clear mathematical definition of what it means for statistical inference to be valid. Second, we evaluate existing approaches Bayesian and frequentist approaches relative to this definition and conclude that, in general, these fail to provide valid statistical inference. This motivates a new way of thinking, and our third contribution is a demonstration that the inferential model framework meets the proposed criteria for valid and prior-free statistical inference, thereby solving perhaps the most important unsolved problem in statistics.

Since solving the “most important unsolved problem in statistics” sounds worth pursuing, I went and checked the paper‘s contents.

“To us, the primary goal of the foundations of statistics is to provide a set of guiding principles that, if followed, will guarantee validity of the resulting inference. Our motivation for writing this paper is to be clear about what is meant by valid inference and to provide the necessary principles to help data analysts achieve validity.”

Which can be interpreted in so many ways that it is somewhat meaningless…

“…if real subjective prior information is available, we recommend using it. However, there is an expanding collection of work (e.g., machine learning, etc) that takes the perspective that no real prior information is available. Even a large part of the literature claiming to be Bayesian has abandoned the interpretation of the prior as a serious part of the model, opting for “default” prior that “works.” Our choice to omit a prior from the model is not for the (misleading) purpose of being “objective”—subjectivity is necessary—but, rather, for the purpose of exploring what can be done in cases where a fully satisfactory prior is not available, to see what improvements can be made over the status quo.”

This is a pretty traditional criticism of the Bayesian approach, namely that if a “true” prior is provided (by whom?) then it is optimal to use it. But this amounts to turn the prior into another piece of the sampling distribution and is not in my opinion a Bayesian argument! Most of the criticisms in the paper are directed at objective Bayes approaches, with the surprising conclusion that, because there exist cases where no matching prior is available, “the objective Bayesian approach [cannot] be considered as a general framework for scientific inference.” (p.9)

Another section argues that a Bayesian modelling cannot describe a state of total ignorance. This is formally correct, which is why there is no such thing as a non-informative or the non-informative prior, as often discussed here, but is this truly relevant, in that the inference problem contains one way or another information about the parameter, for instance through a loss function or a pseudo-likelihood.

“This is a desirable property that most existing methods lack.”

The proposal central to the paper thesis is to replace posterior probabilities by belief functions b(.|X), called statistical inference, that are interpreted as measures of evidence about subsets A of the parameter space. If not necessarily as probabilities. This is not very novel, witness the works of Dempster, Shafer and subsequent researchers. And not very much used outside Bayesian and fiducial statistics because of the mostly impossible task of defining a function over all subsets of the parameter space. Because of the subjectivity of such “beliefs”, they will be “valid” only if they are well-calibrated in the sense of b(A|X) being sub-uniform, that is, more concentrated near zero than a uniform variate (i.e., small) under the alternative, i.e. when θ is not in A. At this stage, since this is a mix of a minimax and proper coverage condition, my interest started to quickly wane… Especially because the sub-uniformity condition is highly demanding, if leading to controls over the Type I error and the frequentist coverage. As often, I wonder at the meaning of a calibration property obtained over all realisations of the random variable and all values of the parameter. So for me stability is neither “desirable” nor “essential”. Overall, I have increasing difficulties in perceiving proper coverage as a relevant property. Which has no stronger or weaker meaning that the coverage derived from a Bayesian construction.

“…frequentism does not provide any guidance for selecting a particular rule or procedure.”

I agree with this assessment, which means that there is no such thing as frequentist inference, but rather a philosophy for assessing procedures. That the Gleser-Hwang paradox invalidates this philosophy sounds a bit excessive, however. Especially when the bounded nature of Bayesian credible intervals is also analysed as a failure. A more relevant criticism is the lack of directives for picking procedures.

“…we are the first to recognize that the belief function’s properties are necessary in order for the inferential output to satisfy the required validity property”

The construction of the “inferential model” proposed by the authors offers similarities withn fiducial inference, in that it builds upon the representation of the observable X as X=a(θ,U). With further constraints on the function a() to ensure the validity condition holds… An interesting point is that the functional connection X=a(θ,U) means that the nature of U changes once X is observed, albeit in a delicate manner outside a Bayesian framework. When illustrated on the Gleser-Hwang paradox, the resolution proceeds from an arbitrary choice of a one-dimensional summary, though. (As I am reading the paper, I realise it builds on other and earlier papers by the authors, papers that I cannot read for lack of time. I must have listned to a talk by one of the authors last year at JSM as this rings a bell. Somewhat.) In conclusion of a quick Sunday afternoon read, I am not convinced by the arguments in the paper and even less by the impression of a remaining arbitrariness in setting the resulting procedure.

ISBA 2016 [#5]

Posted in Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , , , , on June 18, 2016 by xi'an

from above Forte Village, Santa Magherita di Pula, Sardinia, June 17, 2016On Thursday, I started the day by a rather masochist run to the nearby hills, not only because of the very hour but also because, by following rabbit trails that were not intended for my size, I ended up being scratched by thorns and bramble all over!, but also with neat views of the coast around Pula.  From there, it was all downhill [joke]. The first morning talk I attended was by Paul Fearnhead and about efficient change point estimation (which is an NP hard problem or close to). The method relies on dynamic programming [which reminded me of one of my earliest Pascal codes about optimising a dam debit]. From my spectator’s perspective, I wonder[ed] at easier models, from Lasso optimisation to spline modelling followed by testing equality between bits. Later that morning, James Scott delivered the first Bayarri Lecture, created in honour of our friend Susie who passed away between the previous ISBA meeting and this one. James gave an impressive coverage of regularisation through three complex models, with the [hopefully not degraded by my translation] message that we should [as Bayesians] focus on important parts of those models and use non-Bayesian tools like regularisation. I can understand the practical constraints for doing so, but optimisation leads us away from a Bayesian handling of inference problems, by removing the ascertainment of uncertainty…

Later in the afternoon, I took part in the Bayesian foundations session, discussing the shortcomings of the Bayes factor and suggesting the use of mixtures instead. With rebuttals from [friends in] the audience!

This session also included a talk by Victor Peña and Jim Berger analysing and answering the recent criticisms of the Likelihood principle. I am not sure this answer will convince the critics, but I won’t comment further as I now see the debate as resulting from a vague notion of inference in Birnbaum‘s expression of the principle. Jan Hannig gave another foundation talk introducing fiducial distributions (a.k.a., Fisher’s Bayesian mimicry) but failing to provide a foundational argument for replacing Bayesian modelling. (Obviously, I am definitely prejudiced in this regard.)

The last session of the day was sponsored by BayesComp and saw talks by Natesh Pillai, Pierre Jacob, and Eric Xing. Natesh talked about his paper on accelerated MCMC recently published in JASA. Which surprisingly did not get discussed here, but would definitely deserve to be! As hopefully corrected within a few days, when I recoved from conference burnout!!! Pierre Jacob presented a work we are currently completing with Chris Holmes and Lawrence Murray on modularisation, inspired from the cut problem (as exposed by Plummer at MCMski IV in Chamonix). And Eric Xing spoke about embarrassingly parallel solutions, discussed a while ago here.

JSM 2015 [day #3]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on August 12, 2015 by xi'an

My first morning session was about inference for philogenies. While I was expecting some developments around Kingman’s  coalescent models my coauthors needed and developped ABC for, I was surprised to see models that were producing closed form (or close enough to) likelihoods. Due to strong restrictions on the population sizes and migration possibilities, as explained later to me by Vladimir Minin. No need for ABC there since MCMC was working on the species trees, with Vladimir Minin making use of [the Savage Award winner] Vinayak Rao’s approach on trees that differ from the coalescent. And enough structure to even consider and demonstrate tree identifiability in Laura Kubatko’s case.

I then stopped by the astrostatistics session as the first talk by Gwendolin Eddie was about galaxy mass estimation, a problem I may actually be working on in the Fall, but it ended up being a completely different problem and I was further surprised that the issue of whether or not the data was missing at random was not considered by the authors.searise3

Christening a session Unifying foundation(s) may be calling for trouble, at least from me! In this spirit, Xiao Li Meng gave a talk attempting at a sort of unification of the frequentist, Bayesian, and fiducial paradigms by introducing the notion of personalized inference, which is a notion I had vaguely thought of in the past. How much or how far do you condition upon? However, I have never thought of this justifying fiducial inference in any way and Xiao Li’s lively arguments during and after the session not enough to convince me of the opposite: Prior-free does not translate into (arbitrary) choice-free. In the earlier talk about confidence distributions by Regina Liu and Minge Xie, that I partly missed for Galactic reasons, I just entered into the room at the very time when ABC was briefly described as a confidence distribution because it was not producing a convergent approximation to the exact posterior, a logic that escapes me (unless those confidence distributions are described in such a loose way as to include about any method f inference). Dongchu Sun also gave us a crash course on reference priors, with a notion of random posteriors I had not heard of before… As well as constructive posteriors… (They seemed to mean constructible matching priors as far as I understood.)

The final talk in this session by Chuanhei Liu on a new approach (modestly!) called inferential model was incomprehensible, with the speaker repeatedly stating that the principles were too hard to explain in five minutes and needed an incoming book… I later took a brief look at an associated paper, which relates to fiducial inference and to Dempster’s belief functions. For me, it has the same Münchhausen feeling of creating a probability out of nothing, creating a distribution on the parameter by ignoring the fact that the fiducial equation x=a(θ,u) modifies the distribution of u once x is observed.

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [slides]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on December 4, 2014 by xi'an

defsunset2Here are the slides of my incoming discussion of Ron Gallant’s paper, tomorrow.