## Gibbs for kidds

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on February 12, 2018 by xi'an

A chance (?) question on X validated brought me to re-read Gibbs for Kids, 25 years after it was written (by my close friends George and Ed). The originator of the question had difficulties with the implementation, apparently missing the cyclic pattern of the sampler, as in equations (2.3) and (2.4), and with the convergence, which is only processed for a finite support in the American Statistician paper. The paper [which did not appear in American Statistician under this title!, but inspired an animal bredeer, Dan Gianola, to write a “Gibbs for pigs” presentation in 1993 at the 44th Annual Meeting of the European Association for Animal Production, Aarhus, Denmark!!!] most appropriately only contains toy examples since those can be processed and compared to know stationary measures. This is for instance the case for the auto-exponential model

$f(x,y) \propto exp(-xy)$

which is only defined as a probability density for a compact support. (The paper does not identify the model as a special case of auto-exponential model, which apparently made the originator of the model, Julian Besag in 1974, unhappy, as George and I found out when visiting Bath, where Julian was spending the final year of his life, many years later.) I use the limiting case all the time in class to point out that a Gibbs sampler can be devised and operate without a stationary probability distribution. However, being picky!, I would like to point out that, contrary, to a comment made in the paper, the Gibbs sampler does not “fail” but on the contrary still “converges” in this case, in the sense that a conditional ergodic theorem applies, i.e., the ratio of the frequencies of visits to two sets A and B with finite measure do converge to the ratio of these measures. For instance, running the Gibbs sampler 10⁶ steps and ckecking for the relative frequencies of x’s in (1,2) and (1,3) gives 0.685, versus log(2)/log(3)=0.63, since 1/x is the stationary measure. One important and influential feature of the paper is to stress that proper conditionals do not imply proper joints. George would work much further on that topic, in particular with his PhD student at the time, my friend Jim Hobert.

With regard to the convergence issue, Gibbs for Kids points out to Schervish and Carlin (1990), which came quite early when considering Gelfand and Smith published their initial paper the very same year, but which also adopts a functional approach to convergence, along the paper’s fixed point perspective, somehow complicating the matter. Later papers by Tierney (1994), Besag (1995), and Mengersen and Tweedie (1996) considerably simplified the answer, which is that irreducibility is a necessary and sufficient condition for convergence. (Incidentally, the reference list includes a technical report of mine’s on latent variable model MCMC implementation that never got published.)

## an improvable Rao–Blackwell improvement, inefficient maximum likelihood estimator, and unbiased generalized Bayes estimator

Posted in Books, Statistics, University life with tags , , , , , , , , on February 2, 2018 by xi'an

In my quest (!) for examples of location problems with no UMVU estimator, I came across a neat paper by Tal Galili [of R Bloggers fame!] and Isaac Meilijson presenting somewhat paradoxical properties of classical estimators in the case of a Uniform U((1-k)θ,(1+k)θ) distribution when 0<k<1 is known. For this model, the minimal sufficient statistic is the pair made of the smallest and of the largest observations, L and U. Since this pair is not complete, the Rao-Blackwell theorem does not produce a single and hence optimal estimator. The best linear unbiased combination [in terms of its variance] of L and U is derived in this paper, although this does not produce the uniformly minimum variance unbiased estimator, which does not exist in this case. (And I do not understand the remark that

“Any unbiased estimator that is a function of the minimal sufficient statistic is its own Rao–Blackwell improvement.”

as this hints at an infinite sequence of improvement.) While the MLE is inefficient in this setting, the Pitman [best equivariant] estimator is both Bayes [against the scale Haar measure] and unbiased. While experimentally dominating the above linear combination. The authors also argue that, since “generalized Bayes rules need not be admissible”, there is no guarantee that the Pitman estimator is admissible (under squared error loss). But given that this is a uni-dimensional scale estimation problem I doubt very much there is a Stein effect occurring in this case.

## foundations of probability

Posted in Books, Statistics with tags , , , , on December 1, 2017 by xi'an

Following my reading of a note by Gunnar Taraldsen and co-authors on improper priors, I checked the 1970 book of Rényi from the Library at Warwick. (First time I visited this library, where I get very efficient help in finding and borrowing this book!)

“…estimates of probability of an event made by different persons may be different and each such estimate is to a certain extent subjective.” (p.33)

The main argument from Rényi used by the above mentioned note (and an earlier paper in The American Statistician) is that “every probability is in reality a conditional probability” (p.34). Which may be a pleonasm as everything depends on the settings in which it is applied. And as such not particularly new since conditioning is also present in e.g. Jeffreys’ book. In this approach, the definition of the conditional probability is traditional, if restricted to condition on a subset of elements from the σ algebra. The interesting part in the book is rather that a measure on this subset can be derived from the conditionals. And extended to the whole σ algebra. And is unique up to a multiplicative constant. Interesting because this indeed produces a rigorous way of handling improper priors.

“Let the random point (ξ,η) be uniformly distributed over the whole (x,y) plane.” (p.83)

Rényi also defines random variables ξ on conditional probability spaces, with conditional densities. With constraints on ξ for those to exist. I have more difficulties to ingest this notion as I do not see the meaning of the above quote or of the quantity

P(a<ξ<b|c<ξ<d)

when P(a<ξ<b) is not defined. As for instance I see no way of generating such a ξ in this case. (Of course, it is always possible to bring in a new definition of random variables that only agrees with regular ones for finite measure.)

## stop the rot!

Posted in Statistics with tags , , , , , , , , , , , , on September 26, 2017 by xi'an

The second article in Nature is from a group of epidemiologists at the same institute, producing statistics about biomedical publications in predatory journals (characterised as such by the defunct Beall blacklist). And being much more vehement about the danger represented by these journals, which “articles we examined were atrocious in terms of reporting”, and authors submitting to them, as unethical for wasting human and animal observations. The authors of this article identify thirteen characteristics for spotting predatory journals, the first one being “low article-processing fees”, our own misadventure being the opposite. And they ask for higher control and auditing from the funding institutions over their researchers… Besides adding an extra-layer to the bureaucracy, I fear this is rather naïve, as if the boundary between predatory and non-predatory journals was crystal clear, rather than a murky continuum. And putting the blame solely on the researchers rather than sharing it with institutions always eager to push their bibliometrics towards more automation of the assessment of their researchers.

## priors without likelihoods are like sloths without…

Posted in Books, Statistics with tags , , , , , , , , , , , , on September 11, 2017 by xi'an

“The idea of building priors that generate reasonable data may seem like an unusual idea…”

Andrew, Dan, and Michael arXived a opinion piece last week entitled “The prior can generally only be understood in the context of the likelihood”. Which connects to the earlier Read Paper of Gelman and Hennig I discussed last year. I cannot state strong disagreement with the positions taken in this piece, actually, in that I do not think prior distributions ever occur as a given but are rather chosen as a reference measure to probabilise the parameter space and eventually prioritise regions over others. If anything I find myself even further on the prior agnosticism gradation.  (Of course, this lack of disagreement applies to the likelihood understood as a function of both the data and the parameter, rather than of the parameter only, conditional on the data. Priors cannot be depending on the data without incurring disastrous consequences!)

“…it contradicts the conceptual principle that the prior distribution should convey only information that is available before the data have been collected.”

The first example is somewhat disappointing in that it revolves as so many Bayesian textbooks (since Laplace!) around the [sex ratio] Binomial probability parameter and concludes at the strong or long-lasting impact of the Uniform prior. I do not see much of a contradiction between the use of a Uniform prior and the collection of prior information, if only because there is not standardised way to transfer prior information into prior construction. And more fundamentally because a parameter rarely makes sense by itself, alone, without a model that relates it to potential data. As for instance in a regression model. More, following my epiphany of last semester, about the relativity of the prior, I see no damage in the prior being relevant, as I only attach a relative meaning to statements based on the posterior. Rather than trying to limit the impact of a prior, we should rather build assessment tools to measure this impact, for instance by prior predictive simulations. And this is where I come to quite agree with the authors.

“…non-identifiabilities, and near nonidentifiabilites, of complex models can lead to unexpected amounts of weight being given to certain aspects of the prior.”

Another rather straightforward remark is that non-identifiable models see the impact of a prior remain as the sample size grows. And I still see no issue with this fact in a relative approach. When the authors mention (p.7) that purely mathematical priors perform more poorly than weakly informative priors it is hard to see what they mean by this “performance”.

“…judge a prior by examining the data generating processes it favors and disfavors.”

Besides those points, I completely agree with them about the fundamental relevance of the prior as a generative process, only when the likelihood becomes available. And simulatable. (This point is found in many references, including our response to the American Statistician paper Hidden dangers of specifying noninformative priors, with Kaniav Kamary. With the same illustration on a logistic regression.) I also agree to their criticism of the marginal likelihood and Bayes factors as being so strongly impacted by the choice of a prior, if treated as absolute quantities. I also if more reluctantly and somewhat heretically see a point in using the posterior predictive for assessing whether a prior is relevant for the data at hand. At least at a conceptual level. I am however less certain about how to handle improper priors based on their recommendations. In conclusion, it would be great to see one [or more] of the authors at O-Bayes 2017 in Austin as I am sure it would stem nice discussions there! (And by the way I have no prior idea on how to conclude the comparison in the title!)

## ASA’s statement on p-values [#2]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on March 9, 2016 by xi'an

It took a visit on FiveThirtyEight to realise the ASA statement I mentioned yesterday was followed by individual entries from most members of the panel, much more diverse and deeper than the statement itself! Without discussing each and all comments, some points I subscribe to

• it does not make sense to try to replace the p-value and the 5% boundary by something else but of the same nature. This was the main line of our criticism of Valen Johnson’s PNAS paper with Andrew.
• it does not either make sense to try to come up with a hard set answer about whether or not a certain parameter satisfies a certain constraint. A comparison of predictive performances at or around the observed data sounds much more sensible, if less definitive.
• the Bayes factor is often advanced as a viable alternative to the p-value in those comments, but it suffers from difficulties exposed in our recent testing by mixture paper, one being the lack of absolute scale.
• we seem unable to escape the landscape set by Neyman and Pearson when constructing their testing formalism, including the highly unrealistic 0-1 loss function. And the grossly asymmetric opposition between null and alternative hypotheses.
• the behaviour of any procedure of choice should be evaluated under different scenarios, most likely by simulation, including some accounting for misspecified models. Which may require an extra bit of non-parametrics. And we should abstain from considering further than evaluating whether or not the data looks compatible with each of the scenarios. Or how much through the mixture representation.

## ASA’s statement on p-values

Posted in Books, Statistics, University life with tags , , , , , on March 8, 2016 by xi'an

Last night I received an email from the ASA signed by Jessica Utts and Ron Wasserstein with the following sentence

“Widespread use of ‘statistical significance’ (generally interpreted as ‘p

In short, we envision a new era, in which the broad scientific community recognizes what statisticians have been advocating for many years. In this “post p

Is such an era beyond reach? We think not, but we need your help in making sure this opportunity is not lost.”

which is obviously missing important bits. The email was pointing out a free access American Statistician article warning about the misuses and over-interpretations of p-values. Which contains rather basic “principles” that p-values are not probabilities that the null is true, that there is no golden level against which to compare the p-value, that nominal p-values may be far from actual p-values, that they do not provide a measure of evidence per se, &tc. As written in the conclusion, “Nothing in the ASA statement is new”. But, besides calling for caution and the cumulative use of different assessments of evidence, this statement may leave the non-statistician completely nonplussed about how to proceed when testing hypotheses or comparing models. And make the decision of Basic and Applied Social Psychology of rejecting all arguments based on p-values sound sensible.

Incidentally, the article contains the completion of the first sentence [in red below], if not of the second:

“Widespread use of ‘statistical significance’ (generally interpreted as ‘p≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.