a general framework for updating belief functions

Posted in Books, Statistics, University life with tags , , , , , , , , , on July 15, 2013 by xi'an

Pier Giovanni Bissiri, Chris Holmes and Stephen Walker have recently arXived the paper related to Sephen’s talk in London for Bayes 250. When I heard the talk (of which some slides are included below), my interest was aroused by the facts that (a) the approach they investigated could start from a statistics, rather than from a full model, with obvious implications for ABC, & (b) the starting point could be the dual to the prior x likelihood pair, namely the loss function. I thus read the paper with this in mind. (And rather quickly, which may mean I skipped important aspects. For instance, I did not get into Section 4 to any depth. Disclaimer: I wasn’t nor is a referee for this paper!)

The core idea is to stick to a Bayesian (hardcore?) line when missing the full model, i.e. the likelihood of the data, but wishing to infer about a well-defined parameter like the median of the observations. This parameter is model-free in that some degree of prior information is available in the form of a prior distribution. (This is thus the dual of frequentist inference: instead of a likelihood w/o a prior, they have a prior w/o a likelihood!) The approach in the paper is to define a “posterior” by using a functional type of loss function that balances fidelity to prior and fidelity to data. The prior part (of the loss) ends up with a Kullback-Leibler loss, while the data part (of the loss) is an expected loss wrt to l(THETASoEUR,x), ending up with the definition of a “posterior” that is

$\exp\{ -l(\theta,x)\} \pi(\theta)$

the loss thus playing the role of the log-likelihood.

I like very much the problematic developed in the paper, as I think it is connected with the real world and the complex modelling issues we face nowadays. I also like the insistence on coherence like the updating principle when switching former posterior for new prior (a point sorely missed in this book!) The distinction between M-closed M-open, and M-free scenarios is worth mentioning, if only as an entry to the Bayesian processing of pseudo-likelihood and proxy models. I am however not entirely convinced by the solution presented therein, in that it involves a rather large degree of arbitrariness. In other words, while I agree on using the loss function as a pivot for defining the pseudo-posterior, I am reluctant to put the same faith in the loss as in the log-likelihood (maybe a frequentist atavistic gene somewhere…) In particular, I think some of the choices are either hard or impossible to make and remain unprincipled (despite a call to the LP on page 7).  I also consider the M-open case as remaining unsolved as finding a convergent assessment about the pseudo-true parameter brings little information about the real parameter and the lack of fit of the superimposed model. Given my great expectations, I ended up being disappointed by the M-free case: there is no optimal choice for the substitute to the loss function that sounds very much like a pseudo-likelihood (or log thereof). (I thought the talk was more conclusive about this, I presumably missed a slide there!) Another great expectation was to read about the proper scaling of the loss function (since L and wL are difficult to separate, except for monetary losses). The authors propose a “correct” scaling based on balancing both faithfulness for a single observation, but this is not a completely tight argument (dependence on parametrisation and prior, notion of a single observation, &tc.)

The illustration section contains two examples, one of which is a full-size or at least challenging  genetic data analysis. The loss function is based on a logistic  pseudo-likelihood and it provides results where the Bayes factor is in agreement with a likelihood ratio test using Cox’ proportional hazard model. The issue about keeping the baseline function as unkown reminded me of the Robbins-Wasserman paradox Jamie discussed in Varanasi. The second example offers a nice feature of putting uncertainties onto box-plots, although I cannot trust very much the 95%  of the credibles sets. (And I do not understand why a unique loss would come to be associated with the median parameter, see p.25.)

Watch out: Tomorrow’s post contains a reply from the authors!

Posted in Statistics with tags , , , , , , , , , on January 28, 2013 by xi'an

Last Monday, my student Li Chenlu presented the foundational 1962 JASA paper by Allan Birnbaum, On the Foundations of Statistical Inference. The very paper that derives the Likelihood Principle from the cumulated Conditional and Sufficiency principles and that had been discussed [maybe ad nauseam] on this ‘Og!!! Alas, thrice alas!, I was still stuck in the plane flying back from Atlanta as she was presenting her understanding of the paper, as the flight had been delayed four hours thanks to (or rather woe to!) the weather conditions in Paris the day before (chain reaction…):

I am sorry I could not attend this lecture and this for many reasons: first and  foremost, I wanted to attend every talk from my students both out of respect for them and to draw a comparison between their performances. My PhD student Sofia ran the seminar that day in my stead, for which I am quite grateful, but I do do wish I had been there… Second, this a.s. has been the most philosophical paper in the series.and I would have appreciated giving the proper light on the reasons for and the consequences of this paper as Li Chenlu stuck very much on the paper itself. (She provided additional references in the conclusion but they did not seem to impact the slides.)  Discussing for instance Berger’s and Wolpert’s (1988) new lights on the topic, as well as Deborah Mayo‘s (2010) attacks, and even Chang‘s (2012) misunderstandings, would have clearly helped the students.

empirical Bayes (CHANCE)

Posted in Books, Statistics, University life with tags , , , , , , on April 23, 2012 by xi'an

As I decided to add a vignette on empirical Bayes methods to my review of Brad Efron’s Large-scale Inference in the next issue of CHANCE [25(3)], here it is.

Empirical Bayes methods can crudely be seen as the poor man’s Bayesian analysis. They start from a Bayesian modelling, for instance the parameterised prior

$x\sim f(x|\theta)\,,\quad \theta\sim\pi(\theta|\alpha)$

and then, instead of setting α to a specific value or of assigning an hyperprior to this hyperparameter α, as in a regular or a hierarchical Bayes approach, the empirical Bayes paradigm consists in estimating α from the data. Hence the “empirical” label. The reference model used for the estimation is the integrated likelihood (or conditional marginal)

$m(x|\alpha) = \int f(x|\theta) \pi(\theta|\alpha)\,\text{d}\theta$

which defines a distribution density indexed by α and thus allows for the use of any statistical estimation method (moments, maximum likelihood or even Bayesian!). A classical example is provided by the normal exchangeable sample: if

$x_i\sim \mathcal{N}(\theta_i,\sigma^2)\qquad \theta_i\sim \mathcal{N}(\mu,\tau^2)\quad i=1,\ldots,p$

then, marginally,

$x_i \sim \mathcal{N}(\mu,\tau^2+\sigma^2)$

and μ can be estimated by the empirical average of the observations. The next step in an empirical Bayes analysis is to act as if α had not been estimated from the data and to conduct a regular Bayesian processing of the data with this estimated prior distribution. In the above normal example, this means estimating the θi‘s by

$\dfrac{\sigma^2 \bar{x} + \tau^2 x_i}{\sigma^2+\tau^2}$

with the characteristic shrinkage (to the average) property of the resulting estimator (Efron and Morris, 1973).

…empirical Bayes isn’t Bayes.” B. Efron (p.90)

While using Bayesian tools, this technique is outside of the Bayesian paradigm for several reasons: (a) the prior depends on the data, hence it lacks foundational justifications; (b) the prior varies with the data, hence it lacks theoretical validations like Walk’s complete class theorem; (c) the prior uses the data once, hence the posterior uses the data twice (see the vignette about this sin in the previous issue); (d) the prior relies of an estimator, whose variability is not accounted for in the subsequent analysis (Morris, 1983). The original motivation for the approach (Robbins, 1955) was more non-parametric, however it gained popularity in the 70’s and 80’s both in conjunction with the Stein effect and as a practical mean of bypassing complex Bayesian computations. As illustrated by Efron’s book, it recently met with renewed interest in connection with multiple testing.

Bayes posterior just quick and dirty on X’idated

Posted in Statistics, Travel, University life with tags , , , , on February 22, 2012 by xi'an

As a coincidence, I noticed that Don Fraser’s recent discussion paper `Is Bayes posterior just quick and dirty confidence?’ will be discussed this Friday (18:00 UTC) on the Cross Validated Journal Club. I do not know whether or not to interpret the information “The author confirmed his presence at the event” as meaning Don Fraser will be on line to discuss his paper with X’ed members Feel free to join anyway if you have 20 reputation points or plan to get those by Friday! (I will be in the train coming back from Oxford. Oxford, England, not Mississippi!)

Principles of Applied Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on February 13, 2012 by xi'an

This book by David Cox and Christl Donnelly, Principles of Applied Statistics, is an extensive coverage of all the necessary steps and precautions one must go through when contemplating applied (i.e. real!) statistics. As the authors write in the very first sentence of the book, “applied statistics is more than data analysis” (p.i); the title could indeed have been “Principled Data Analysis”! Indeed, Principles of Applied Statistics reminded me of how much we (at least I) take “the model” and “the data” for granted when doing statistical analyses, by going through all the pre-data and post-data steps that lead to the “idealized” (p.188) data analysis. The contents of the book are intentionally simple, with hardly any mathematical aspect, but with a clinical attention to exhaustivity and clarity. For instance, even though I would have enjoyed more stress on probabilistic models as the basis for statistical inference, they only appear in the fourth chapter (out of ten) with error in variable models. The painstakingly careful coverage of the myriad of tiny but essential steps involved in a statistical analysis and the highlight of the numerous corresponding pitfalls was certainly illuminating to me.  Just as the book refrains from mathematical digressions (“our emphasis is on the subject-matter, not on the statistical techniques as such p.12), it falls short from engaging into detail and complex data stories. Instead, it uses little grey boxes to convey the pertinent aspects of a given data analysis, referring to a paper for the full story. (I acknowledge this may be frustrating at times, as one would like to read more…) The book reads very nicely and smoothly, and I must acknowledge I read most of it in trains, métros, and planes over the past week. (This remark is not  intended as a criticism against a lack of depth or interest, by all means [and medians]!)

A general principle, sounding superficial but difficult to implement, is that analyses should be as simple as possible, but not simpler.” (p.9)

To get into more details, Principles of Applied Statistics covers the (most!) purposes of statistical analyses (Chap. 1), design with some special emphasis (Chap. 2-3), which is not surprising given the record of the authors (and “not a moribund art form”, p.51), measurement (Chap. 4), including the special case of latent variables and their role in model formulation, preliminary analysis (Chap. 5) by which the authors mean data screening and graphical pre-analysis, [at last!] models (Chap. 6-7), separated in model formulation [debating the nature of probability] and model choice, the later being  somehow separated from the standard meaning of the term (done in §8.4.5 and §8.4.6), formal [mathematical] inference (Chap. 8), covering in particular testing and multiple testing, interpretation (Chap. 9), i.e. post-processing, and a final epilogue (Chap. 10). The readership of the book is rather broad, from practitioners to students, although both categories do require a good dose of maturity, to teachers, to scientists designing experiments with a statistical mind. It may be deemed too philosophical by some, too allusive by others, but I think it constitutes a magnificent testimony to the depth and to the spectrum of our field.

Of course, all choices are to some extent provisional.“(p.130)

As a personal aside,  I appreciated the illustration through capture-recapture models (p.36) with a remark of the impact of toe-clipping on frogs, as it reminded me of a similar way of marking lizards when my (then) student Jérôme Dupuis was working on a corresponding capture-recapture dataset in the 90’s. On the opposite, while John Snow‘s story [of using maps to explain the cause of cholera] is alluring, and his map makes for a great cover, I am less convinced it is particularly relevant within this book.

The word Bayesian, however, became more widely used, sometimes representing a regression to the older usage of flat prior distributions supposedly representing initial ignorance, sometimes meaning models in which the parameters of interest are regarded as random variables and occasionaly meaning little more than that the laws of probability are somewhere invoked.” (p.144)

My main quibble with the book goes, most unsurprisingly!, with the processing of Bayesian analysis found in Principles of Applied Statistics (pp.143-144). Indeed, on the one hand, the method is mostly criticised over those two pages. On the other hand, it is the only method presented with this level of details, including historical background, which seems a bit superfluous for a treatise on applied statistics. The drawbacks mentioned are (p.144)

• the weight of prior information or modelling as “evidence”;
• the impact of “indifference or ignorance or reference priors”;
• whether or not empirical Bayes modelling has been used to construct the prior;
• whether or not the Bayesian approach is anything more than a “computationally convenient way of obtaining confidence intervals”

The empirical Bayes perspective is the original one found in Robbins (1956) and seems to find grace in the authors’ eyes (“the most satisfactory formulation”, p.156). Contrary to MCMC methods, “a black box in that typically it is unclear which features of the data are driving the conclusions” (p.149)…

If an issue can be addressed nonparametrically then it will often be better to tackle it parametrically; however, if it cannot be resolved nonparametrically then it is usually dangerous to resolve it parametrically.” (p.96)

Apart from a more philosophical paragraph on the distinction between machine learning and statistical analysis in the final chapter, with the drawback of using neural nets and such as black-box methods (p.185), there is relatively little coverage of non-parametric models, the choice of “parametric formulations” (p.96) being openly chosen. I can somehow understand this perspective for simpler settings, namely that nonparametric models offer little explanation of the production of the data. However, in more complex models, nonparametric components often are a convenient way to evacuate burdensome nuisance parameters…. Again, technical aspects are not the focus of Principles of Applied Statistics so this also explains why it does not dwell intently on nonparametric models.

A test of meaningfulness of a possible model for a data-generating process is whether it can be used directly to simulate data.” (p.104)

The above remark is quite interesting, especially when accounting for David Cox’ current appreciation of ABC techniques. The impossibility to generate from a posited model as some found in econometrics precludes using ABC, but this does not necessarily mean the model should be excluded as unrealistic…

The overriding general principle is that there should be a seamless flow between statistical and subject-matter considerations.” (p.188)

As mentioned earlier, the last chapter brings a philosophical conclusion on what is (applied) statistics. It is stresses the need for a careful and principled use of black-box methods so that they preserve a general framework and lead to explicit interpretations.

May I believe I am a Bayesian?!

Posted in Books, Statistics, University life with tags , , , , , , , , , on January 21, 2012 by xi'an

…the argument is false that because some ideal form of this approach to reasoning seems excellent n theory it therefore follows that in practice using this and only this approach to reasoning is the right thing to do.” Stephen Senn, 2011

Deborah Mayo, Aris Spanos, and Kent Staley have edited a special issue of Rationality, Markets and Morals (RMM) (a rather weird combination, esp. for a journal name!) on “Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?” for which comments are open. Stephen Senn has a paper therein entitled You May Believe You Are a Bayesian But You Are Probably Wrong in his usual witty, entertaining, and… Bayesian-bashing style! I find it very kind of him to allow us to remain in the wrong, very kind indeed…

Now, the paper somehow intersects with the comments Stephen made on our review of Harold Jeffreys’ Theory of Probability a while ago. It contains a nice introduction to the four great systems of statistical inference, embodied by de Finetti, Fisher, Jeffreys, and Neyman plus Pearson. The main criticism of Bayesianism à la de Finetti is that it is so perfect as to be outworldish. And, since this perfection is lost in the practical implementation, there is no compelling reason to be a Bayesian. Worse, that all practical Bayesian implementations conflict with Bayesian principles. Hence a Bayesian author “in practice is wrong”. Stephen concludes with a call for eclecticism, quite in line with his usual style since this is likely to antagonise everyone. (I wonder whether or not having no final dot to the paper has a philosophical meaning. Since I have been caught in over-interpreting book covers, I will not say more!) As I will try to explain below, I believe Stephen has paradoxically himself fallen victim of over-theorising/philosophising! (Referring the interested reader to the above post as well as to my comments on Don Fraser’s “Is Bayes posterior quick and dirty confidence?” for more related points. Esp. about Senn’s criticisms of objective Bayes on page 52 that are not so central to this discussion… Same thing for the different notions of probability [p.49] and the relative difficulties of the terms in (2) [p.50]. Deborah Mayo has a ‘deconstructed” version of Stephen’s paper on her blog, with a much deeper if deBayesian philosophical discussion. And then Andrew Jaffe wrote a post in reply to Stephen’s paper. Whose points I cannot discuss for lack of time, but with an interesting mention of Jaynes as missing in Senn’s pantheon.)

The Bayesian theory is a theory on how to remain perfect but it does not explain how to become good.” Stephen Senn, 2011

While associating theories with characters is a reasonable rethoretical device, especially with large scale characters as the one above!, I think it deters the reader from a philosophical questioning on the theory behind the (big) man. (In fact, it is a form of bullying or, more politely (?), of having big names shoved down your throat as a form of argument.)  In particular, Stephen freezes the (Bayesian reasoning about the) Bayesian paradigm in its de Finetti phase-state, arguing about what de Finetti thought and believed. While this is historically interesting, I do not see why we should care at the praxis level. (I have made similar comments on this blog about the unpleasant aspects of being associated with one character, esp. the mysterious Reverent Bayes!) But this is not my main point.

…in practice things are not so simple.” Stephen Senn, 2011

The core argument in Senn’s diatribe is that reality is always more complex than the theory allows for and thus that a Bayesian has to compromise on her/his perfect theory with reality/practice in order to reach decisions. A kind of philosophical equivalent to Achille and the tortoise. However, it seems to me that the very fact that the Bayesian paradigm is a learning principle implies that imprecisions and imperfections are naturally endowed into the decision process. Thus avoiding the apparent infinite regress (Regress ins Unendliche) of having to run a Bayesian analysis to derive the prior for the Bayesian analysis at the level below (which is how I interpret Stephen’s first paragraph in Section 3). By refusing the transformation of a perfect albeit ideal Bayesian into a practical if imperfect bayesian (or coherent learner or whatever name that does not sound like being a member of a sect!), Stephen falls short of incorporating the contrainte de réalité into his own paradigm. The further criticisms found about prior justification, construction, evaluation (pp.59-60) are also of that kind, namely preventing the statistician to incorporate a degree of (probabilistic) uncertainty into her/his analysis.

In conclusion, reading Stephen’s piece was a pleasant and thought-provoking moment. I am glad to be allowed to believe I am a Bayesian, even though I do not believe it is a belief! The praxis of thousands of scientists using Bayesian tools with their personal degree of subjective involvement is an evolutive organism that reaches much further than the highly stylised construct of de Finetti (or of de Finetti restaged by Stephen!). And appropriately getting away from claims to being perfect or right. Or even being more philosophical.

principles of uncertainty

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 14, 2011 by xi'an

Bayes Theorem is a simple consequence of the axioms of probability, and is therefore accepted by all as valid. However, some who challenge the use of personal probability reject certain applications of Bayes Theorem.”  J. Kadane, p.44

Principles of uncertainty by Joseph (“Jay”) Kadane (Carnegie Mellon University, Pittsburgh) is a profound and mesmerising book on the foundations and principles of subjectivist or behaviouristic Bayesian analysis. Jay Kadane wrote Principles of uncertainty over a period of several years and, more or less in his own words, it represents the legacy he wants to leave for the future. The book starts with a large section on Jay’s definition of a probability model, with rigorous mathematical derivations all the way to Lebesgue measure (or more exactly the McShane-Stieltjes measure). This section contains many side derivations that pertain to mathematical analysis, in order to explain the subtleties of infinite countable and uncountable sets, and the distinction between finitely additive and countably additive (probability) measures. Unsurprisingly, the role of utility is emphasized in this book that keeps stressing the personalistic entry to Bayesian statistics. Principles of uncertainty also contains a formal development on the validity of Markov chain Monte Carlo methods that is superb and missing in most equivalent textbooks. Overall, the book is a pleasure to read. And highly recommended for teaching as it can be used at many different levels. Continue reading