**L**ast December, Gunnar Taraldsen, Jarle Tufto, and Bo H. Lindqvist arXived a paper on using priors that lead to improper posteriors and [trying to] getting away with it! The central concept in their approach is Rényi’s generalisation of Kolmogorov’s version to define conditional probability distributions from infinite mass measures by conditioning on finite mass measurable sets. A position adopted by Dennis Lindley in his 1964 book .And already discussed in a few ‘Og’s posts. While the theory thus developed indeed allows for the manipulation of improper posteriors, I have difficulties with the inferential aspects of the construct, since one cannot condition on an arbitrary finite measurable set without prior information. Things get a wee bit more outwardly when considering “data” with infinite mass, in Section 4.2, since they cannot be properly normalised (although I find the example of the degenerate multivariate Gaussian distribution puzzling as it is not a matter of improperness, since the degenerate Gaussian has a well-defined density against the right dominating measure). The paper also discusses marginalisation paradoxes, by acknowledging that marginalisation is no longer feasible with improper quantities. And the Jeffreys-Lindley paradox, with a resolution that uses the sum of the Dirac mass at the null, δ⁰, and of the Lebesgue measure on the real line, λ, as the dominating measure. This indeed solves the issue of the arbitrary constant in the Bayes factor, since it is “the same” on the null hypothesis and elsewhere, but I do not buy the argument, as I see no reason to favour δ⁰+λ over 3.141516 δ⁰+λ or δ⁰+1.61718 λ… (This section 4.5 also illustrates that the choice of the sequence of conditioning sets has an impact on the limiting measure, in the Rényi sense.) In conclusion, after reading the paper, I remain uncertain as to how to exploit this generalisation from an inferential (Bayesian?) viewpoint, since improper posteriors do not clearly lead to well-defined inferential procedures…

## Archive for Dennis Lindley

## statistics with improper posteriors [or not]

Posted in Statistics with tags Alfréd Rényi, Andrei Kolmogorov, Dennis Lindley, improper posteriors, improper priors, Jeffreys-Lindley paradox, marginalisation paradoxes on March 6, 2019 by xi'an## p-values and decision-making [reposted]

Posted in Books, Statistics, University life with tags 0.005, 0.05, books, decision theory, Dennis Lindley, hypothesis testing, Nicholas T. Longford, p-values, Robert Matthews, Significance, statistical significance on August 30, 2017 by xi'an*I**n a letter to Significance about a review of Robert Matthews’s book, Chancing it, Nicholas Longford recalls a few basic facts about p-values and decision-making earlier made by Dennis Lindley in Making Decisions. Here are some excerpts, worth repeating in the light of the 0.005 proposal:*

“A statement of significance based on a p-value is a verdict that is oblivious to consequences. In my view, this disqualifies hypothesis testing, and p-values with it, from making rational decisions. Of course, the p-value could be supplemented by considerations of these consequences, although this is rarely done in a transparent manner. However, the two-step procedure of calculating the p-value and then incorporating the consequences is unlikely to match in its integrity the single-stage procedure in which we compare the expected losses associated with the two contemplated options.”

“At present, [Lindley’s] decision-theoretical approach is difficult to implement in practice. This is not because of any computational complexity or some problematic assumptions, but because of our collective reluctance to inquire about the consequences – about our clients’ priorities, remits and value judgements. Instead, we promote a culture of “objective” analysis, epitomised by the 5% threshold in significance testing. It corresponds to a particular balance of consequences, which may or may not mirror our clients’ perspective.”

“The p-value and statistical significance are at best half-baked products in the process of making decisions, and a distraction at worst, because the ultimate conclusion of a statistical analysis should be a proposal for what to do next in our clients’ or our own research, business, production or some other agenda. Let’s reflect and admit how frequently we abuse hypothesis testing by adopting (sometimes by stealth) the null hypothesis when we fail to reject it, and therefore do so without any evidence to support it. How frequently we report, or are party to reporting, the results of hypothesis tests selectively. The problem is not with our failing to adhere to the convoluted strictures of a popular method, but with the method itself. In the 1950s, it was a great statistical invention, and its popularisation later on a great scientific success. Alas, decades later, it is rather out of date, like the steam engine. It is poorly suited to the demands of modern science, business, and society in general, in which the budget and pocketbook are important factors.”

## Gregynog #2 [jatp]

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags Aberystwyth, Britain, Charles Hanbury-Tracy, concrete, Dennis Lindley, Gregynog, Gregynog Statistical Conference, jatp, Powys, seminar, Tregynon, University of Wales, University of Warwick, Wales on April 26, 2017 by xi'an## X-Outline of a Theory of Statistical Estimation

Posted in Books, Statistics, University life with tags Bayesian Analysis, confidence intervals, credible intervals, Dennis Lindley, Harold Jeffreys, inference, Jerzy Neyman, maximum likelihood estimation, unbiasedness, University of Warwick, X-Outline on March 23, 2017 by xi'an**W**hile visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.

“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”

“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionallycontradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “

This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).

“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”

“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”

It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.

“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”

The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.

## mixtures as exponential families

Posted in Kids, Statistics with tags Annals of Mathematical Statistics, cross validated, Dennis Lindley, exponential families, gamma distribution, label switching, location-scale parameterisation, mixtures of distributions on December 8, 2015 by xi'an**S**omething I had not realised earlier and that came to me when answering a question on X validated about the scale parameter of a Gamma distribution. Following an earlier characterisation by Dennis Lindley, Ferguson has written a famous paper characterising location, scale and location-scale families within exponential families. For instance, a one-parameter location exponential family is necessarily the logarithm of a power of a Gamma distribution. What I found surprising is the equivalent for one-parameter scale exponential families: they are necessarily mixtures of positive and negative powers of Gamma distributions. This is surprising because a mixture does not seem to fit within the exponential family representation… I first thought Ferguson was using a different type of mixtures. Or of exponential family. But, after checking the details, it appears that the mixture involves a component on ℜ⁺ and another component on ℜ⁻ with a potential third component as a Dirac mass at zero. Hence, it only nominally writes as a mixture and does not offer the same challenges as a regular mixture. Not label switching. No latent variable. Having mutually exclusive supports solves all those problems and even allows for an indicator function to permeate through the exponential function… (Recall that the special mixture processed in Rubio and Steel also enjoys this feature.)