Archive for probability theory

essentials of probability theory for statisticians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 25, 2020 by xi'an

On yet another confined sunny lazy Sunday morning, I read through Proschan and Shaw’s Essentials of Probability Theory for Statisticians, a CRC Press book that was sent to me quite a while ago for review. The book was indeed published in 2016. Before moving to serious things, let me evacuate the customary issue with the cover. I have trouble getting the point of the “face on Mars” being adopted as the cover of a book on probability theory (rather than a book on, say, pareidolia). There is a brief paragraph on post-facto probability calculations, stating how meaningless the question of the probability of this shade appearing on a Viking Orbiter picture by “chance”, but this is so marginal I would have preferred any other figure from the book!

The book plans to cover the probability essentials for dealing with graduate level statistics and in particular convergence, conditioning, and paradoxes following from using non-rigorous approaches to probability. A range that completely fits my own prerequisite for statistics students in my classes and that of course involves the recourse to (Lebesgue) measure theory. And a goal that I find both commendable and comforting as my past experience with exchange students led me to the feeling that rigorous probability theory was mostly scrapped from graduate programs. While the book is not extremely formal, it provides a proper motivation for the essential need of measure theory to handle the complexities of statistical analysis and in particular of asymptotics. It thus relies as much as possible on examples that stem from or relate to statistics, even though most examples may appear as standard to senior readers. For instance the consistency of the sample median or a weak version of the Glivenko-Cantelli theorem. The final chapter is dedicated to applications (in the probabilist’ sense!) that emerged from statistical problems. I felt these final chapters were somewhat stretched compared with what they could have been, as for instance with the multiple motivations of the conditional expectation, but this simply makes for more material. If I had to teach this material to students, I would certainly rely on the book! in particular because of the repeated appearances of the quincunx for motivating non-Normal limites. (A typo near Fatou’s lemma missed the dominating measure. And I did not notice the Riemann notation dx being extended to the measure in a formal manner.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Mea Culpa

Posted in Statistics with tags , , , , , , , , , , , on April 10, 2020 by xi'an

[A quote from Jaynes about improper priors that I had missed in his book, Probability Theory.]

For many years, the present writer was caught in this error just as badly as anybody else, because Bayesian calculations with improper priors continued to give just the reasonable and clearly correct results that common sense demanded. So warnings about improper priors went unheeded; just that psychological phenomenon. Finally, it was the marginalization paradox that forced recognition that we had only been lucky in our choice of problems. If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors. If the correct limiting procedure should yield an improper posterior pdf for some parameter α, then probability theory is telling us that the prior information and data are too meager to permit any inferences about α. Then the only remedy is to seek more data or more prior information; probability theory does not guarantee in advance that it will lead us to a useful answer to every conceivable question.Generally, the posterior pdf is better behaved than the prior because of the extra information in the likelihood function, and the correct limiting procedure yields a useful posterior pdf that is analytically simpler than any from a proper prior. The most universally useful results of Bayesian analysis obtained in the past are of this type, because they tended to be rather simple problems, in which the data were indeed so much more informative than the prior information that an improper prior gave a reasonable approximation – good enough for all practical purposes – to the strictly correct results (the two results agreed typically to six or more significant figures).

In the future, however, we cannot expect this to continue because the field is turning to more complex problems in which the prior information is essential and the solution is found by computer. In these cases it would be quite wrong to think of passing to an improper prior. That would lead usually to computer crashes; and, even if a crash is avoided, the conclusions would still be, almost always, quantitatively wrong. But, since likelihood functions are bounded, the analytical solution with proper priors is always guaranteed to converge properly to finite results; therefore it is always possible to write a computer program in such a way (avoid underflow, etc.) that it cannot crash when given proper priors. So, even if the criticisms of improper priors on grounds of marginalization were unjustified,it remains true that in the future we shall be concerned necessarily with proper priors.

Markov Chains [not a book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on January 14, 2019 by xi'an

As Randal Douc and Éric Moulines are both very close friends and two authors of this book on Markov chains,  I cannot engage into a regular book review! Judging from the table of contents, the coverage is not too dissimilar to the now classic Markov chain Stochastic Stability book by Sean Meyn and the late Richard Tweedie (1994), called the Bible of Markov chains by Peter Glynn, with more emphasis on convergence matters and a more mathematical perspective. The 757 pages book also includes a massive appendix on maths and probability background. As indicated in the preface, “the reason [the authors] thought it would be useful to write a new book is to survey some of the developments made during the 25 years that have elapsed since the publication of Meyn and Tweedie (1993b).” Connecting with the theoretical developments brought by MCMC methods. Like subgeometric rates of convergence to stationarity, sample paths, limit theorems, and concentration inequalities. The book also reflects on the numerous contributions of the authors to the field. Hence a perfect candidate for teaching Markov chains to mathematically well-prepared. graduate audiences. Congrats to the authors!

are there a frequentist and a Bayesian likelihoods?

Posted in Statistics with tags , , , , , , , , , , on June 7, 2018 by xi'an

A question that came up on X validated and led me to spot rather poor entries in Wikipedia about both the likelihood function and Bayes’ Theorem. Where unnecessary and confusing distinctions are made between the frequentist and Bayesian versions of these notions. I have already discussed the later (Bayes’ theorem) a fair amount here. The discussion about the likelihood is quite bemusing, in that the likelihood function is the … function of the parameter equal to the density indexed by this parameter at the observed value.

“What we can find from a sample is the likelihood of any particular value of r, if we define the likelihood as a quantity proportional to the probability that, from a population having the particular value of r, a sample having the observed value of r, should be obtained.” R.A. Fisher, On the “probable error’’ of a coefficient of correlation deduced from a small sample. Metron 1, 1921, p.24

By mentioning an informal side to likelihood (rather than to likelihood function), and then stating that the likelihood is not a probability in the frequentist version but a probability in the Bayesian version, the W page makes a complete and unnecessary mess. Whoever is ready to rewrite this introduction is more than welcome! (Which reminded me of an earlier question also on X validated asking why a common reference measure was needed to define a likelihood function.)

This also led me to read a recent paper by Alexander Etz, whom I met at E.J. Wagenmakers‘ lab in Amsterdam a few years ago. Following Fisher, as Jeffreys complained about

“..likelihood, a convenient term introduced by Professor R.A. Fisher, though in his usage it is sometimes multiplied by a constant factor. This is the probability of the observations given the original information and the hypothesis under discussion.” H. Jeffreys, Theory of Probability, 1939, p.28

Alexander defines the likelihood up to a constant, which causes extra-confusion, for free!, as there is no foundational reason to introduce this degree of freedom rather than imposing an exact equality with the density of the data (albeit with an arbitrary choice of dominating measure, never neglect the dominating measure!). The paper also repeats the message that the likelihood is not a probability (density, missing in the paper). And provides intuitions about maximum likelihood, likelihood ratio and Wald tests. But does not venture into a separate definition of the likelihood, being satisfied with the fundamental notion to be plugged into the magical formula


fiducial inference

Posted in Books, Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , on October 30, 2017 by xi'an

In connection with my recent tale of the many ε’s, I received from Gunnar Taraldsen [from Tronheim, Norge] a paper [jointly written with Bo Lindqvist and just appeared on-line in JSPI] on conditional fiducial models.

“The role of the prior and the statistical model in Bayesian analysis is replaced by the use of the fiducial model x=R(θ,ε) in fiducial inference. The fiducial is obtained in this case without a prior distribution for the parameter.”

Reading this paper after addressing the X validated question made me understood better the fundamental wrongness of fiducial analysis! If I may herein object to Fisher himself… Indeed, when writing x=R(θ,ε), as the representation of the [observed] random variable x as a deterministic transform of a parameter θ and of an [unobserved] random factor ε, the two random variables x and ε are based on the same random preimage ω, i.e., x=x(ω) and ε=ε(ω). Observing x hence sets a massive constraint on the preimage ω and on the conditional distribution of ε=ε(ω). When the fiducial inference incorporates another level of randomness via an independent random variable ε’ and inverts x=R(θ,ε’) into θ=θ(x,ε’), assuming there is only one solution to the inversion, it modifies the nature of the underlying σ-algebra into something that is incompatible with the original model. Because of this sudden duplication of the random variates. While the inversion of this equation x=R(θ,ε’) gives an idea of the possible values of θ when ε varies according to its [prior] distribution, it does not account for the connection between x and ε. And does not turn the original parameter into a random variable with an implicit prior distribution.

As to conditional fiducial distributions, they are defined by inversion of x=R(θ,ε), under a certain constraint on θ, like C(θ)=0, which immediately raises a Pavlovian reaction in me, namely that since the curve C(θ)=0 has measure zero under the original fiducial distribution, how can this conditional solution be uniquely or at all defined. Or to avoid the Borel paradox mentioned in the paper. If I get the meaning of the authors in this section, the resulting fiducial distribution will actually depend on the choice of σ-algebra governing the projection.

“A further advantage of the fiducial approach in the case of a simple fiducial model is that independent samples are produced directly from independent sampling from [the fiducial distribution]. Bayesian simulations most often come as dependent samples from a Markov chain.”

This side argument in “favour” of the fiducial approach is most curious as it brings into the picture computational aspects that do not have any reason to be there. (The core of the paper is concerned with the unicity of the fiducial distribution in some univariate settings. Not with computational issues.)