## foundations of probability

Posted in Books, Statistics with tags , , , , on December 1, 2017 by xi'an

Following my reading of a note by Gunnar Taraldsen and co-authors on improper priors, I checked the 1970 book of Rényi from the Library at Warwick. (First time I visited this library, where I got very efficient help in finding and borrowing this book!)

“…estimates of probability of an event made by different persons may be different and each such estimate is to a certain extent subjective.” (p.33)

The main argument from Rényi used by the above mentioned note (and an earlier paper in The American Statistician) is that “every probability is in reality a conditional probability” (p.34). Which may be a pleonasm as everything depends on the settings in which it is applied. And as such not particularly new since conditioning is also present in e.g. Jeffreys’ book. In this approach, the definition of the conditional probability is traditional, if restricted to condition on a subset of elements from the σ algebra. The interesting part in the book is rather that a measure on this subset can be derived from the conditionals. And extended to the whole σ algebra. And is unique up to a multiplicative constant. Interesting because this indeed produces a rigorous way of handling improper priors.

“Let the random point (ξ,η) be uniformly distributed over the whole (x,y) plane.” (p.83)

Rényi also defines random variables ξ on conditional probability spaces, with conditional densities. With constraints on ξ for those to exist. I have more difficulties to ingest this notion as I do not see the meaning of the above quote or of the quantity

P(a<ξ<b|c<ξ<d)

when P(a<ξ<b) is not defined. As for instance I see no way of generating such a ξ in this case. (Of course, it is always possible to bring in a new definition of random variables that only agrees with regular ones for finite measure.)

## stop the rot!

Posted in Statistics with tags , , , , , , , , , , , , on September 26, 2017 by xi'an

The second article in Nature is from a group of epidemiologists at the same institute, producing statistics about biomedical publications in predatory journals (characterised as such by the defunct Beall blacklist). And being much more vehement about the danger represented by these journals, which “articles we examined were atrocious in terms of reporting”, and authors submitting to them, as unethical for wasting human and animal observations. The authors of this article identify thirteen characteristics for spotting predatory journals, the first one being “low article-processing fees”, our own misadventure being the opposite. And they ask for higher control and auditing from the funding institutions over their researchers… Besides adding an extra-layer to the bureaucracy, I fear this is rather naïve, as if the boundary between predatory and non-predatory journals was crystal clear, rather than a murky continuum. And putting the blame solely on the researchers rather than sharing it with institutions always eager to push their bibliometrics towards more automation of the assessment of their researchers.

## priors without likelihoods are like sloths without…

Posted in Books, Statistics with tags , , , , , , , , , , , , on September 11, 2017 by xi'an

“The idea of building priors that generate reasonable data may seem like an unusual idea…”

Andrew, Dan, and Michael arXived a opinion piece last week entitled “The prior can generally only be understood in the context of the likelihood”. Which connects to the earlier Read Paper of Gelman and Hennig I discussed last year. I cannot state strong disagreement with the positions taken in this piece, actually, in that I do not think prior distributions ever occur as a given but are rather chosen as a reference measure to probabilise the parameter space and eventually prioritise regions over others. If anything I find myself even further on the prior agnosticism gradation.  (Of course, this lack of disagreement applies to the likelihood understood as a function of both the data and the parameter, rather than of the parameter only, conditional on the data. Priors cannot be depending on the data without incurring disastrous consequences!)

“…it contradicts the conceptual principle that the prior distribution should convey only information that is available before the data have been collected.”

The first example is somewhat disappointing in that it revolves as so many Bayesian textbooks (since Laplace!) around the [sex ratio] Binomial probability parameter and concludes at the strong or long-lasting impact of the Uniform prior. I do not see much of a contradiction between the use of a Uniform prior and the collection of prior information, if only because there is not standardised way to transfer prior information into prior construction. And more fundamentally because a parameter rarely makes sense by itself, alone, without a model that relates it to potential data. As for instance in a regression model. More, following my epiphany of last semester, about the relativity of the prior, I see no damage in the prior being relevant, as I only attach a relative meaning to statements based on the posterior. Rather than trying to limit the impact of a prior, we should rather build assessment tools to measure this impact, for instance by prior predictive simulations. And this is where I come to quite agree with the authors.

“…non-identifiabilities, and near nonidentifiabilites, of complex models can lead to unexpected amounts of weight being given to certain aspects of the prior.”

Another rather straightforward remark is that non-identifiable models see the impact of a prior remain as the sample size grows. And I still see no issue with this fact in a relative approach. When the authors mention (p.7) that purely mathematical priors perform more poorly than weakly informative priors it is hard to see what they mean by this “performance”.

“…judge a prior by examining the data generating processes it favors and disfavors.”

Besides those points, I completely agree with them about the fundamental relevance of the prior as a generative process, only when the likelihood becomes available. And simulatable. (This point is found in many references, including our response to the American Statistician paper Hidden dangers of specifying noninformative priors, with Kaniav Kamary. With the same illustration on a logistic regression.) I also agree to their criticism of the marginal likelihood and Bayes factors as being so strongly impacted by the choice of a prior, if treated as absolute quantities. I also if more reluctantly and somewhat heretically see a point in using the posterior predictive for assessing whether a prior is relevant for the data at hand. At least at a conceptual level. I am however less certain about how to handle improper priors based on their recommendations. In conclusion, it would be great to see one [or more] of the authors at O-Bayes 2017 in Austin as I am sure it would stem nice discussions there! (And by the way I have no prior idea on how to conclude the comparison in the title!)

## ASA’s statement on p-values [#2]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on March 9, 2016 by xi'an

It took a visit on FiveThirtyEight to realise the ASA statement I mentioned yesterday was followed by individual entries from most members of the panel, much more diverse and deeper than the statement itself! Without discussing each and all comments, some points I subscribe to

• it does not make sense to try to replace the p-value and the 5% boundary by something else but of the same nature. This was the main line of our criticism of Valen Johnson’s PNAS paper with Andrew.
• it does not either make sense to try to come up with a hard set answer about whether or not a certain parameter satisfies a certain constraint. A comparison of predictive performances at or around the observed data sounds much more sensible, if less definitive.
• the Bayes factor is often advanced as a viable alternative to the p-value in those comments, but it suffers from difficulties exposed in our recent testing by mixture paper, one being the lack of absolute scale.
• we seem unable to escape the landscape set by Neyman and Pearson when constructing their testing formalism, including the highly unrealistic 0-1 loss function. And the grossly asymmetric opposition between null and alternative hypotheses.
• the behaviour of any procedure of choice should be evaluated under different scenarios, most likely by simulation, including some accounting for misspecified models. Which may require an extra bit of non-parametrics. And we should abstain from considering further than evaluating whether or not the data looks compatible with each of the scenarios. Or how much through the mixture representation.

## ASA’s statement on p-values

Posted in Books, Statistics, University life with tags , , , , , on March 8, 2016 by xi'an

Last night I received an email from the ASA signed by Jessica Utts and Ron Wasserstein with the following sentence

“Widespread use of ‘statistical significance’ (generally interpreted as ‘p

In short, we envision a new era, in which the broad scientific community recognizes what statisticians have been advocating for many years. In this “post p

Is such an era beyond reach? We think not, but we need your help in making sure this opportunity is not lost.”

which is obviously missing important bits. The email was pointing out a free access American Statistician article warning about the misuses and over-interpretations of p-values. Which contains rather basic “principles” that p-values are not probabilities that the null is true, that there is no golden level against which to compare the p-value, that nominal p-values may be far from actual p-values, that they do not provide a measure of evidence per se, &tc. As written in the conclusion, “Nothing in the ASA statement is new”. But, besides calling for caution and the cumulative use of different assessments of evidence, this statement may leave the non-statistician completely nonplussed about how to proceed when testing hypotheses or comparing models. And make the decision of Basic and Applied Social Psychology of rejecting all arguments based on p-values sound sensible.

Incidentally, the article contains the completion of the first sentence [in red below], if not of the second:

“Widespread use of ‘statistical significance’ (generally interpreted as ‘p≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.

## A discussion on Bayesian analysis : Selecting Noninformative Priors

Posted in Statistics with tags , , , , , , on February 26, 2014 by xi'an

Following an earlier post on the American Statistician 2013 paper by Seaman III and co-authors, Hidden dangers of specifying noninformative priors, my PhD student Kaniav Kamary wrote a paper re-analysing the examples processed by those authors and concluding to the stability of the posterior distributions of the parameters and to the effect of the noninformative prior being essentially negligible. (This is the very first paper quoting verbatim from the ‘Og!) Kaniav logically submitted the paper to the American Statistician.

## hidden dangers of noninformative priors

Posted in Books, Statistics, University life with tags , , , , , on November 21, 2013 by xi'an

Last year, John Seaman (III), John Seaman (Jr.), and James Stamey published a paper in The American Statistician with the title Hidden dangers of specifying noninformative priors. (It does not seem to be freely available on-line.) I gave it to read to my PhD students, meaning to read towards the goal of writing a critical reply to the authors. In the meanwhile, here are my own two-cents on the paper.

“Applications typically employ Markov chain Monte Carlo (MCMC) methods to obtain posterior features, resulting in the need for proper priors, even when the modeler prefers that priors be relatively noninformative.” (p.77)

Apart from the above quote, which confuses proper priors with proper posteriors (maybe as the result of a contagious BUGS!), and which is used to focus solely and sort-of inappropriately on proper priors, there is no hard fact to bite in, but rather a collection of soft decisions and options that end up weakly supporting the authors’ thesis. (Obviously, following an earlier post, there is no such thing as a “noninformative” prior.) The paper is centred on four examples where a particular choice of (“noninformative”) prior leads to peaked or informative priors on some transform(s) of the parameters. Note that there is no definition provided for informative, non-informative, diffuse priors, except those found in BUGS with “extremely large variance” (p.77). (The quote below seems to settle on a uniform prior if one understands the “likely” as evaluated through the posterior density.) The argument of the authors is that “if parameters with diffuse proper priors are subsequently transformed, the resulting induced priors can, of course, be far from diffuse, possibly resulting in unintended influence on the posterior of the transformed parameters” (p.77).