Archive for Likelihood Principle

my likelihood is dominating my prior [not!]

Posted in Kids, Statistics with tags , , , , , on August 29, 2019 by xi'an

An interesting misconception read on X validated today, with a confusion between the absolute value of the likelihood function and its variability. Which I have trouble explaining except possibly by the extrapolation from the discrete case and a confusion between the probability density of the data [scaled as a probability] and the likelihood function [scale-less]. I also had trouble convincing the originator of the question of the irrelevance of the scale of the likelihood per se, even when demonstrating that |đšș| could vanish from the posterior with no consequence whatsoever. It is only when I thought of the case when the likelihood is constant in 𝜃 that I managed to make my case.

R wins COPSS Award!

Posted in Statistics with tags , , , , , , , , on August 4, 2019 by xi'an

Hadley Wickham from RStudio has won the 2019 COPSS Award, which expresses a rather radical switch from the traditional recipient of this award in that this recognises his many contributions to the R language and in particular to RStudio. The full quote for the nomination is his  “influential work in statistical computing, visualisation, graphics, and data analysis” including “making statistical thinking and computing accessible to a large audience”. With the last part possibly a recognition of the appeal of Open Source… (I was not in Denver for the awards ceremony, having left after the ABC session on Monday morning. Unfortunately, this session only attracted a few souls, due to the competition of twentysome other sessions, including, excusez du peu!, David Dunson’s Medallion Lecture and Michael Lavine’s IOL on the likelihood principle. And Marco Ferreira’s short-course on Bayesian time series. This is the way the joint meeting goes, but it is disappointing to reach so few people.)

a hatchet job [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , on July 20, 2019 by xi'an

By happenstance, I came across a rather savage review of John Hartigan’s Bayes Theory (1984) written by Bruce Hill in HASA, including the following slivers:

“By and large this book is at its best in developing the mathematical consequences of the theory and at its worst when dealing with the underlying ideas and concepts, which seems unfortunate since Bayesian statistics is above all an attempt to deal realistically with the nature of uncertainty and decision making.” B. Hill, JASA, 1986, p.569

“Unfortunately, those who had hoped for a serious contribution to the question will be disappointed.” B. Hill, JASA, 1986, p.569

“If the primary concern is mathematical convenience, not content or meaning, then the enterprise is a very different matter from what most of us think of as Bayesian approach.” B. Hill, JASA, 1986, p.570

“Perhaps in a century or two statisticians and probabilists will reach a similar state of maturity.” B. Hill, JASA, 1986, p.570

Perhaps this is a good place to mention that the notation in the book is formidable. Bayes’s theorem appears in a form that is  almost unrecognizable. As elsewhere, the mathematical treatment is elegant. but none of the deeper issues about the meaning and interpretation of conditional probability is discussed.” B. Hill, JASA, 1986, p.570

“The reader will find many intriguing ideas, much that is outrageous, and even some surprises (the likelihood principle is not mentioned, and conditional inference is just barely mentioned).” B. Hill, JASA, 1986, p.571

What is disappointing to me is that with a little more discipline and effort with regard to the ideas underlying Bayesian statistics, this book could have been a major contribution to the theory.” B. Hill, JASA, 1986, p.571

Another review by William Sudderth (1985, Bulletin of the American Mathematical Society) is much kinder to the book, except for the complaint that “the pace is brisk and sometimes hard to follow”.

a generalized representation of Bayesian inference

Posted in Books with tags , , , , , , on July 5, 2019 by xi'an

Jeremias Knoblauch, Jack Jewson and Theodoros Damoulas, all affiliated with Warwick (hence a potentially biased reading!), arXived a paper on loss-based Bayesian inference that Jack discussed with me on my last visit to Warwick. As I was somewhat scared by the 61 pages, of which the 8 first pages are in NeurIPS style. The authors argue for a decision-theoretic approach to Bayesian inference that involves a loss over distributions and a divergence from the prior. For instance, when using the log-score as the loss and the Kullback-Leibler divergence, the regular posterior emerges, as shown by Arnold Zellner. Variational inference also falls under this hat. The argument for this generalization is that any form of loss can be used and still returns a distribution that is used to assess uncertainty about the parameter (of interest). In the axioms they produce for justifying the derivation of the optimal procedure, including cases where the posterior is restricted to a certain class, one [Axiom 4] generalizes the likelihood principle. Given the freedom brought by this general framework, plenty of fringe Bayes methods like standard variational Bayes can be seen as solutions to such a decision problem. Others like EP do not. Of interest to me are the potentials for this formal framework to encompass misspecification and likelihood-free settings, as well as for assessing priors, which is always a fishy issue. (The authors mention in addition the capacity to build related specific design Bayesian deep networks, of which I know nothing.) The obvious reaction of mine is one of facing an abundance of wealth (!) but encompassing approximate Bayesian solutions within a Bayesian framework remains an exciting prospect.

Computational Bayesian Statistics [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on February 1, 2019 by xi'an

This Cambridge University Press book by M. AntĂłnia Amaral Turkman, Carlos Daniel Paulino, and Peter MĂŒller is an enlarged translation of a set of lecture notes in Portuguese. (Warning: I have known Peter MĂŒller from his PhD years in Purdue University and cannot pretend to perfect objectivity. For one thing, Peter once brought me frozen-solid beer: revenge can also be served cold!) Which reminds me of my 1994 French edition of MĂ©thodes de Monte Carlo par chaĂźnes de Markov, considerably upgraded into Monte Carlo Statistical Methods (1998) thanks to the input of George Casella. (Re-warning: As an author of books on the same topic(s), I can even less pretend to objectivity.)

“The “great idea” behind the development of computational Bayesian statistics is the recognition that Bayesian inference can be implemented by way of simulation from the posterior distribution.”

The book is written from a strong, almost militant, subjective Bayesian perspective (as, e.g., when half-Bayesians are mentioned!). Subjective (and militant) as in Dennis Lindley‘s writings, eminently quoted therein. As well as in Tony O’Hagan‘s. Arguing that the sole notion of a Bayesian estimator is the entire posterior distribution. Unless one brings in a loss function. The book also discusses the Bayes factor in a critical manner, which is fine from my perspective.  (Although the ban on improper priors makes its appearance in a very indirect way at the end of the last exercise of the first chapter.)

Somewhat at odds with the subjectivist stance of the previous chapter, the chapter on prior construction only considers non-informative and conjugate priors. Which, while understandable in an introductory book, is a wee bit disappointing. (When mentioning Jeffreys’ prior in multidimensional settings, the authors allude to using univariate Jeffreys’ rules for the marginal prior distributions, which is not a well-defined concept or else Bernardo’s and Berger’s reference priors would not have been considered.) The chapter also mentions the likelihood principle at the end of the last exercise, without a mention of the debate about its derivation by Birnbaum. Or Deborah Mayo’s recent reassessment of the strong likelihood principle. The following chapter is a sequence of illustrations in classical exponential family models, classical in that it is found in many Bayesian textbooks. (Except for the Poison model found in Exercise 3.3!)

Nothing to complain (!) about the introduction of Monte Carlo methods in the next chapter, especially about the notion of inference by Monte Carlo methods. And the illustration by Bayesian design. The chapter also introduces Rao-Blackwellisation [prior to introducing Gibbs sampling!]. And the simplest form of bridge sampling. (Resuscitating the weighted bootstrap of Gelfand and Smith (1990) may not be particularly urgent for an introduction to the topic.) There is furthermore a section on sequential Monte Carlo, including the Kalman filter and particle filters, in the spirit of Pitt and Shephard (1999). This chapter is thus rather ambitious in the amount of material covered with a mere 25 pages. Consensus Monte Carlo is even mentioned in the exercise section.

“This and other aspects that could be criticized should not prevent one from using this [Bayes factor] method in some contexts, with due caution.”

Chapter 5 turns back to inference with model assessment. Using Bayesian p-values for model assessment. (With an harmonic mean spotted in Example 5.1!, with no warning about the risks, except later in 5.3.2.) And model comparison. Presenting the whole collection of xIC information criteria. from AIC to WAIC, including a criticism of DIC. The chapter feels somewhat inconclusive but methinks this is the right feeling on the current state of the methodology for running inference about the model itself.

“Hint: There is a very easy answer.”

Chapter 6 is also a mostly standard introduction to Metropolis-Hastings algorithms and the Gibbs sampler. (The argument given later of a Metropolis-Hastings algorithm with acceptance probability one does not work.) The Gibbs section also mentions demarginalization as a [latent or auxiliary variable] way to simulate from complex distributions [as we do], but without defining the notion. It also references the precursor paper of Tanner and Wong (1987). The chapter further covers slice sampling and Hamiltonian Monte Carlo, the later with sufficient details to lead to reproducible implementations. Followed by another standard section on convergence assessment, returning to the 1990’s feud of single versus multiple chain(s). The exercise section gets much larger than in earlier chapters with several pages dedicated to most problems. Including one on ABC, maybe not very helpful in this context!

“…dimension padding (…) is essentially all that is to be said about the reversible jump. The rest are details.”

The next chapter is (somewhat logically) the follow-up for trans-dimensional problems and marginal likelihood approximations. Including Chib’s (1995) method [with no warning about potential biases], the spike & slab approach of George and McCulloch (1993) that I remember reading in a cafĂ© at the University of Wyoming!, the somewhat antiquated MCÂł of Madigan and York (1995). And then the much more recent array of Bayesian lasso techniques. The trans-dimensional issues are covered by the pseudo-priors of Carlin and Chib (1995) and the reversible jump MCMC approach of Green (1995), the later being much more widely employed in the literature, albeit difficult to tune [and even to comprehensively describe, as shown by the algorithmic representation in the book] and only recommended for a large number of models under comparison. Once again the exercise section is most detailed, with recent entries like the EM-like variable selection algorithm of RočkovĂĄ and George (2014).

The book also includes a chapter on analytical approximations, which is also the case in ours [with George Casella] despite my reluctance to bring them next to exact (simulation) methods. The central object is the INLA methodology of Rue et al. (2009) [absent from our book for obvious calendar reasons, although Laplace and saddlepoint approximations are found there as well]. With a reasonable amount of details, although stopping short of implementable reproducibility. Variational Bayes also makes an appearance, mostly following the very recent Blei et al. (2017).

The gem and originality of the book are primarily to be found in the final and ninth chapter where four software are described, all with interfaces to R: OpenBUGS, JAGS, BayesX, and Stan, plus R-INLA which is processed in the second half of the chapter (because this is not a simulation method). As in the remainder of the book, the illustrations are related to medical applications. Worth mentioning is the reminder that BUGS came in parallel with Gelfand and Smith (1990) Gibbs sampler rather than as a consequence. Even though the formalisation of the Markov chain Monte Carlo principle by the later helped in boosting the power of this software. (I also appreciated the mention made of Sylvia Richardson’s role in this story.) Since every software is illustrated in depth with relevant code and output, and even with the shortest possible description of its principle and modus vivendi, the chapter is 60 pages long [and missing a comparative conclusion]. Given my total ignorance of the very existence of the BayesX software, I am wondering at the relevance of its inclusion in this description rather than, say, other general R packages developed by authors of books such as Peter Rossi. The chapter also includes a description of CODA, with an R version developed by Martin Plummer [now a Warwick colleague].

In conclusion, this is a high-quality and all-inclusive introduction to Bayesian statistics and its computational aspects. By comparison, I find it much more ambitious and informative than Albert’s. If somehow less pedagogical than the thicker book of Richard McElreath. (The repeated references to Paulino et al.  (2018) in the text do not strike me as particularly useful given that this other book is written in Portuguese. Unless an English translation is in preparation.)

Disclaimer: this book was sent to me by CUP for endorsement and here is what I wrote in reply for a back-cover entry:

An introduction to computational Bayesian statistics cooked to perfection, with the right mix of ingredients, from the spirited defense of the Bayesian approach, to the description of the tools of the Bayesian trade, to a definitely broad and very much up-to-date presentation of Monte Carlo and Laplace approximation methods, to an helpful description of the most common software. And spiced up with critical perspectives on some common practices and an healthy focus on model assessment and model selection. Highly recommended on the menu of Bayesian textbooks!

And this review is likely to appear in CHANCE, in my book reviews column.

severe testing : beyond Statistics wars?!

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on January 7, 2019 by xi'an

A timely start to my reading Deborah Mayo’s [properly printed] Statistical Inference as Severe Testing (How to get beyond the Statistics Wars) on the Armistice Day, as it seems to call for just this, an armistice! And the opportunity of a long flight to Oaxaca in addition… However, this was only the start and it took me several further weeks to peruse seriously enough the book (SIST) before writing the (light) comments below. (Receiving a free copy from CUP and then a second one directly from Deborah after I mentioned the severe sabotage!)

Indeed, I sort of expected a different content when taking the subtitle How to get beyond the Statistics Wars at face value. But on the opposite the book is actually very severely attacking anything not in the line of the Cox-Mayo severe testing line. Mostly Bayesian approach(es) to the issue! For instance, Jim Berger’s construct of his reconciliation between Fisher, Neyman, and Jeffreys is surgically deconstructed over five pages and exposed as a Bayesian ploy. Similarly, the warnings from Dennis Lindley and other Bayesians that the p-value attached with the Higgs boson experiment are not probabilities that the particle does not exist are met with ridicule. (Another go at Jim’s Objective Bayes credentials is found in the squared myth of objectivity chapter. Maybe more strongly than against staunch subjectivists like Jay Kadane. And yet another go when criticising the Berger and Sellke 1987 lower bound results. Which even extends to Vale Johnson’s UMP-type Bayesian tests.)

“Inference should provide posterior probabilities, final degrees of support, belief, probability (…) not provided by Bayes factors.” (p.443)

Another subtitle of the book could have been testing in Flatland given the limited scope of the models considered with one or at best two parameters and almost always a Normal setting. I have no idea whatsoever how the severity principle would apply in more complex models, with e.g. numerous nuisance parameters. By sticking to the simplest possible models, the book can carry on with the optimality concepts of the early days, like sufficiency (p.147) and and monotonicity and uniformly most powerful procedures, which only make sense in a tiny universe.

“The estimate is really a hypothesis about the value of the parameter.  The same data warrant the hypothesis constructed!” (p.92)

There is an entire section on the lack of difference between confidence intervals and the dual acceptance regions, although the lack of unicity in defining either of them should come as a bother. Especially outside Flatland. Actually the following section, from p.193 onward, reminds me of fiducial arguments, the more because Schweder and Hjort are cited there. (With a curve like Fig. 3.3. operating like a cdf on the parameter Ό but no dominating measure!)

“The Fisher-Neyman dispute is pathological: there’s no disinterring the truth of the matter (…) Fisher grew to renounce performance goals he himself had held when it was found that fiducial solutions disagreed with them.”(p.390)

Similarly the chapter on the “myth of the “the myth of objectivity””(p.221) is mostly and predictably targeting Bayesian arguments. The dismissal of Frank Lad’s arguments for subjectivity ends up [or down] with a rather cheap that it “may actually reflect their inability to do the math” (p.228). [CoI: I once enjoyed a fantastic dinner cooked by Frank in Christchurch!] And the dismissal of loss function requirements in Ziliak and McCloskey is similarly terse, if reminding me of Aris Spanos’ own arguments against decision theory. (And the arguments about the Jeffreys-Lindley paradox as well.)

“It’s not clear how much of the current Bayesian revolution is obviously Bayesian.” (p.405)

The section (Tour IV) on model uncertainty (or against “all models are wrong”) is somewhat limited in that it is unclear what constitutes an adequate (if wrong) model. And calling for the CLT cavalry as backup (p.299) is not particularly convincing.

It is not that everything is controversial in SIST (!) and I found agreement in many (isolated) statements. Especially in the early chapters. Another interesting point made in the book is to question whether or not the likelihood principle at all makes sense within a testing setting. When two models (rather than a point null hypothesis) are X-examined, it is a rare occurrence that the likelihood factorises any further than the invariance by permutation of iid observations. Which reminded me of our earlier warning on the dangers of running ABC for model choice based on (model specific) sufficient statistics. Plus a nice sprinkling of historical anecdotes, esp. about Neyman’s life, from Poland, to Britain, to California, with some time in Paris to attend Borel’s and Lebesgue’s lectures. Which is used as a background for a play involving Bertrand, Borel, Neyman and (Egon) Pearson. Under the title “Les Miserables Citations” [pardon my French but it should be Les MisĂ©rables if Hugo is involved! Or maybe les gilets jaunes…] I also enjoyed the sections on reuniting Neyman-Pearson with Fisher, while appreciating that Deborah Mayo wants to stay away from the “minefields” of fiducial inference. With, mot interestingly, Neyman himself trying in 1956 to convince Fisher of the fallacy of the duality between frequentist and fiducial statements (p.390). Wisely quoting Nancy Reid at BFF4 stating the unclear state of affair on confidence distributions. And the final pages reawakened an impression I had at an earlier stage of the book, namely that the ABC interpretation on Bayesian inference in Rubin (1984) could come closer to Deborah Mayo’s quest for comparative inference (p.441) than she thinks, in that producing parameters producing pseudo-observations agreeing with the actual observations is an “ability to test accordance with a single model or hypothesis”.

“Although most Bayesians these days disavow classic subjective Bayesian foundations, even the most hard-nosed. “we’re not squishy” Bayesian retain the view that a prior distribution is an important if not the best way to bring in background information.” (p.413)

A special mention to Einstein’s cafe (p.156), which reminded me of this picture of Einstein’s relative Cafe I took while staying in Melbourne in 2016… (Not to be confused with the Markov bar in the same city.) And a fairly minor concern that I find myself quoted in the sections priors: a gallimaufry (!) and… Bad faith Bayesianism (!!), with the above qualification. Although I later reappear as a pragmatic Bayesian (p.428), although a priori as a counter-example!

Measuring statistical evidence using relative belief [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on July 22, 2015 by xi'an

“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)

This new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian  principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it). [Warning: this is a rather long review and post, so readers may chose to opt out now!]

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

Continue reading