Archive for E.T. Jaynes

Statistical rethinking [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on April 6, 2016 by xi'an

Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!

“Make no mistake: you will wreck Prague eventually.” (p.10)

While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book. Continue reading

failures and uses of Jaynes’ principle of transformation groups

Posted in Books, Kids, R, Statistics, University life with tags , , , , on April 14, 2015 by xi'an

This paper by Alon Drory was arXived last week when I was at Columbia. It reassesses Jaynes’ resolution of Bertrand’s paradox, which finds three different probabilities for a given geometric event depending on the underlying σ-algebra (or definition of randomness!). Both Poincaré and Jaynes argued against Bertrand that there was only one acceptable solution under symmetry properties. The author of this paper, Alon Drory, argues this is not the case!

“…contrary to Jaynes’ assertion, each of the classical three solutions of Bertrand’s problem (and additional ones as well!) can be derived by the principle of transformation groups, using the exact same symmetries, namely rotational, scaling and translational invariance.”

Drory rephrases as follows:  “In a circle, select at random a chord that is not a diameter. What is the probability that its length is greater than the side of the equilateral triangle inscribed in the circle?”.  Jaynes’ solution is indifferent to the orientation of one observer wrt the circle, to the radius of the circle, and to the location of the centre. The later is the one most discussed by Drory, as he argued that it does not involve an observer but the random experiment itself and relies on a specific version of straw throws in Jaynes’ argument. Meaning other versions are also available. This reminded me of an earlier post on Buffon’s needle and on the different versions of the needle being thrown over the floor. Therein reflecting on the connection with Bertrand’s paradox. And running some further R experiments. Drory’s alternative to Jaynes’ manner of throwing straws is to impale them on darts and throw the darts first! (Which is the same as one of my needle solutions.)

“…the principle of transformation groups does not make the problem well-posed, and well-posing strategies that rely on such symmetry considerations ought therefore to be rejected.”

In short, the conclusion of the paper is that there is an indeterminacy in Bertrand’s problem that allows several resolutions under the principle of indifference that end up with a large range of probabilities, thus siding with Bertrand rather than Jaynes.

Bayesian programming [book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on March 3, 2014 by xi'an

“We now think the Bayesian Programming methodology and tools are reaching maturity. The goal of this book is to present them so that anyone is able to use them. We will, of course, continue to improve tools and develop new models. However, pursuing the idea that probability is an alternative to Boolean logic, we now have a new important research objective, which is to design specific hsrdware, inspired from biology, to build a Bayesian computer.”(p.xviii)

On the plane to and from Montpellier, I took an extended look at Bayesian Programming a CRC Press book recently written by Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin, and Kamel Mekhnacha. (Very nice picture of a fishing net on the cover, by the way!) Despite the initial excitement at seeing a book which final goal was to achieve a Bayesian computer, as demonstrated by the above quote, I however soon found the book too arid to read due to its highly formalised presentation… The contents are clear indications that the approach is useful as they illustrate the use of Bayesian programming in different decision-making settings, including a collection of Python codes, so it brings an answer to the what but it somehow misses the how in that the construction of the priors and the derivation of the posteriors is not explained in a way one could replicate.

“A modeling methodology is not sufficient to run Bayesian programs. We also require an efficient Bayesian inference engine to automate the probabilistic calculus. This assumes we have a collection of inference algorithms adapted and tuned to more or less specific models and a software architecture to combine them in a coherent and unique tool.” (p.9)

For instance, all models therein are described via the curly brace formalism summarised by

phdthesis28xwhich quickly turns into an unpalatable object, as in this example taken from the online PhD thesis of Gabriel Synnaeve (where he applied Bayesian programming principles to a MMORPG called StarCraft and developed an AI (or bot) able to play BroodwarBotQ)

phdthesis37xthesis that I found most interesting!

“Consequently, we have 21 × 16 = 336 bell-shaped distributions and we have 2 × 21 × 16 = 772 free parameters: 336 means and 336 standard deviations.¨(p.51)

Now, getting back to the topic of the book, I can see connections with statistical problems and models, and not only via the application of Bayes’ theorem, when the purpose (or Question) is to take a decision, for instance in a robotic action. I still remain puzzled by the purpose of the book, since it starts with very low expectations on the reader, but hurries past notions like Kalman filters and Metropolis-Hastings algorithms in a few paragraphs. I do not get some of the details, like this notion of a discretised Gaussian distribution (I eventually found the place where the 772 prior parameters are “learned” in a phase called “identification”.)

“Thanks to conditional independence the curse of dimensionality has been broken! What has been shown to be true here for the required memory space is also true for the complexity of inferences. Conditional independence is the principal tool to keep the calculation tractable. Tractability of Bayesian inference computation is of course a major concern as it has been proved NP-hard (Cooper, 1990).”(p.74)

The final chapters (Chap. 14 on “Bayesian inference algorithms revisited”, Chap. 15 on “Bayesian learning revisited” and  Chap. 16 on “Frequently asked questions and frequently argued matters” [!]) are definitely those I found easiest to read and relate to. With mentions made of conjugate priors and of the EM algorithm as a (Bayes) classifier. The final chapter mentions BUGS, Hugin and… Stan! Plus a sequence of 23 PhD theses defended on Bayesian programming for robotics in the past 20 years. And explains the authors’ views on the difference between Bayesian programming and Bayesian networks (“any Bayesian network can be represented in the Bayesian programming formalism, but the opposite is not true”, p.316), between Bayesian programming and probabilistic programming (“we do not search to extend classical languages but rather to replace them by a new programming approach based on probability”, p.319), between Bayesian programming and Bayesian modelling (“Bayesian programming goes one step further”, p.317), with a further (self-)justification of why the book sticks to discrete variables, and further more philosophical sections referring to Jaynes and the principle of maximum entropy.

“The “objectivity” of the subjectivist approach then lies in the fact that two different subjects with same preliminary knowledge and same observations will inevitably reach the same conclusions.”(p.327)

Bayesian Programming thus provides a good snapshot of (or window on) what one can achieve in uncertain environment decision-making with Bayesian techniques. It shows a long-term reflection on those notions by Pierre Bessière, his colleagues and students. The topic is most likely too remote from my own interests for the above review to be complete. Therefore, if anyone is interested in reviewing any further this book for CHANCE, before I send the above to the journal, please contact me. (Usual provisions apply.)

MaxEnt 2013, Canberra, Dec. 15-20

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on July 3, 2013 by xi'an

inversion building over the Australian Capital Territory from Black Mountain, Aug. 14, 2012Just got this announcement that MaxEnt 2013, 33ième du genre, is taking place in Canberra, Australia, next December. (Which is winter here but summer there!) See the website for details, although they are not yet aplenty! I took part in MaxEnt 2009, in Oxford, Mississipi, but will not attend MaxEnt 2013 as it is (far away and) during O-Bayes 2013 in Duke…

May I believe I am a Bayesian?!

Posted in Books, Statistics, University life with tags , , , , , , , , , on January 21, 2012 by xi'an

…the argument is false that because some ideal form of this approach to reasoning seems excellent n theory it therefore follows that in practice using this and only this approach to reasoning is the right thing to do.” Stephen Senn, 2011

Deborah Mayo, Aris Spanos, and Kent Staley have edited a special issue of Rationality, Markets and Morals (RMM) (a rather weird combination, esp. for a journal name!) on “Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?” for which comments are open. Stephen Senn has a paper therein entitled You May Believe You Are a Bayesian But You Are Probably Wrong in his usual witty, entertaining, and… Bayesian-bashing style! I find it very kind of him to allow us to remain in the wrong, very kind indeed…


Now, the paper somehow intersects with the comments Stephen made on our review of Harold Jeffreys’ Theory of Probability a while ago. It contains a nice introduction to the four great systems of statistical inference, embodied by de Finetti, Fisher, Jeffreys, and Neyman plus Pearson. The main criticism of Bayesianism à la de Finetti is that it is so perfect as to be outworldish. And, since this perfection is lost in the practical implementation, there is no compelling reason to be a Bayesian. Worse, that all practical Bayesian implementations conflict with Bayesian principles. Hence a Bayesian author “in practice is wrong”. Stephen concludes with a call for eclecticism, quite in line with his usual style since this is likely to antagonise everyone. (I wonder whether or not having no final dot to the paper has a philosophical meaning. Since I have been caught in over-interpreting book covers, I will not say more!) As I will try to explain below, I believe Stephen has paradoxically himself fallen victim of over-theorising/philosophising! (Referring the interested reader to the above post as well as to my comments on Don Fraser’s “Is Bayes posterior quick and dirty confidence?” for more related points. Esp. about Senn’s criticisms of objective Bayes on page 52 that are not so central to this discussion… Same thing for the different notions of probability [p.49] and the relative difficulties of the terms in (2) [p.50]. Deborah Mayo has a ‘deconstructed” version of Stephen’s paper on her blog, with a much deeper if deBayesian philosophical discussion. And then Andrew Jaffe wrote a post in reply to Stephen’s paper. Whose points I cannot discuss for lack of time, but with an interesting mention of Jaynes as missing in Senn’s pantheon.)


The Bayesian theory is a theory on how to remain perfect but it does not explain how to become good.” Stephen Senn, 2011

While associating theories with characters is a reasonable rethoretical device, especially with large scale characters as the one above!, I think it deters the reader from a philosophical questioning on the theory behind the (big) man. (In fact, it is a form of bullying or, more politely (?), of having big names shoved down your throat as a form of argument.)  In particular, Stephen freezes the (Bayesian reasoning about the) Bayesian paradigm in its de Finetti phase-state, arguing about what de Finetti thought and believed. While this is historically interesting, I do not see why we should care at the praxis level. (I have made similar comments on this blog about the unpleasant aspects of being associated with one character, esp. the mysterious Reverent Bayes!) But this is not my main point.

…in practice things are not so simple.” Stephen Senn, 2011

The core argument in Senn’s diatribe is that reality is always more complex than the theory allows for and thus that a Bayesian has to compromise on her/his perfect theory with reality/practice in order to reach decisions. A kind of philosophical equivalent to Achille and the tortoise. However, it seems to me that the very fact that the Bayesian paradigm is a learning principle implies that imprecisions and imperfections are naturally endowed into the decision process. Thus avoiding the apparent infinite regress (Regress ins Unendliche) of having to run a Bayesian analysis to derive the prior for the Bayesian analysis at the level below (which is how I interpret Stephen’s first paragraph in Section 3). By refusing the transformation of a perfect albeit ideal Bayesian into a practical if imperfect bayesian (or coherent learner or whatever name that does not sound like being a member of a sect!), Stephen falls short of incorporating the contrainte de réalité into his own paradigm. The further criticisms found about prior justification, construction, evaluation (pp.59-60) are also of that kind, namely preventing the statistician to incorporate a degree of (probabilistic) uncertainty into her/his analysis.

In conclusion, reading Stephen’s piece was a pleasant and thought-provoking moment. I am glad to be allowed to believe I am a Bayesian, even though I do not believe it is a belief! The praxis of thousands of scientists using Bayesian tools with their personal degree of subjective involvement is an evolutive organism that reaches much further than the highly stylised construct of de Finetti (or of de Finetti restaged by Stephen!). And appropriately getting away from claims to being perfect or right. Or even being more philosophical.

Jaynes’ marginalisation paradox

Posted in Books, Statistics, University life with tags , , on June 13, 2011 by xi'an

After delivering my one-day lecture on Jaynes’ Probability Theory, I gave as assignment to the students that they wrote their own analysis of Chapter 15 (Paradoxes of probability theory), given its extensive and exciting coverage of the marginalisation paradoxes and my omission of it in the lecture notes… Up to now, only Jean-Bernard Salomon has returned a (good albeit short) synthesis of the chapter, seemingly siding with Jaynes’ analysis that a “good” noninformative prior should avoid the paradox. (In short, my own view of the problem is to side with Dawid, Stone, and Zidek, in that the paradox is only a paradox when interpreting marginals of infinite measures as if they were probability marginals…) This made me wonder if there could be a squared marginalisation paradox: find a statistical model parameterised by θ with a nuisance parameter η=η(θ) such that when the parameter of interest is ξ=ξ(θ) the prior on η solving the marginalisation paradox is not the same as when the parameter of interest is ζ=ζ(θ) [I have not given the problem more than a few seconds thought so this may prove a logical impossibility!]

Frequency vs. probability

Posted in Statistics with tags , , , , , , , on May 6, 2011 by xi'an

Probabilities obtained by maximum entropy cannot be relevant to physical predictions because they have nothing to do with frequencies.” E.T. Jaynes, PT, p.366

A frequency is a factual property of the real world that we measure or estimate. The phrase `estimating a probability’ is just as much an incongruity as `assigning a frequency’. The fundamental, inescapable distinction between probability and frequency lies in this relativity principle: probabilities change when we change our state of knowledge, frequencies do not.” E.T. Jaynes, PT, p.292

A few days ago, I got the following email exchange with Jelle Wybe de Jong from The Netherlands:

Q. I have a question regarding your slides of your presentation of Jaynes’ Probability Theory. You used the [above second] quote: Do you agree with this statement? It seems to me that a lot of  ‘Bayesians’ still refer to ‘estimating’ probabilities. Does it make sense for example for a bank to estimate a probability of default for their loan portfolio? Or does it only make sense to estimate a default frequency and summarize the uncertainty (state of knowledge) through the posterior? Continue reading


Get every new post delivered to your Inbox.

Join 1,021 other followers