I found my (short) trip to Abdijan for the CISEA 2019 conference quite fantastic as it allowed me to meet with old friends, from the earliest days at CREST and even before, and to meet new ones. Including local students of ENSEA who had taken a Bayesian course out of my Bayesian Choice book. And who had questions about the nature of priors and the difficulty they had in accepting that several replies were possible with the same data! I wish I had had more time to discuss the relativity of Bayesian statements with them but this was a great and rare opportunity to find avid readers of my books! I also had a long chat with another student worried about the use or mis-use of reversible jump algorithms to draw inference on time-series models in Bayesian Essentials, chat that actually demonstrated his perfect understanding of the matter. And it was fabulous to meet so many statisticians and econometricians from West Africa, most of them French-speaking. My only regret is not having any free time to visit Abidjan or the neighbourhood as the schedule of the conference did not allow for it [or even for a timely posting of a post!], especially as it regularly ran overtime. (But it did provide for a wide range of new local dishes that I definitely enjoyed tasting!) We are now discussing further opportunities to visit there, e.g. by teaching a short course at the Master or PhD levels.

## Archive for The Bayesian Choice

## ENSEA & CISEA 2019

Posted in Books, pictures, Statistics, Travel, University life with tags ., Abidjan, Africa, Bayesian Essentials with R, CISEA 2019, econometrics, ENSAE, ENSEA, Francophonie, Ivory Coast, The Bayesian Choice, West Africa on June 26, 2019 by xi'an## from tramway to Panzer (or back!)…

Posted in Books, pictures, Statistics with tags Bayesian Analysis, German tank problem, Laplace succession rule, order statistics, The Bayesian Choice, tramway problem, tramways on June 14, 2019 by xi'an **A**lthough it is usually presented as *the tramway problem*, namely estimating the number of tram or bus lines in a city given observing one line number, including The Bayesian Choice by yours truly, the original version of the problem is about German tanks, Panzer V tanks to be precise, which total number *M* was to be estimated by the Allies from their observation of serial numbers of a number *k* of tanks. The Riddler is restating the problem when the only available information is made of the smallest, 22, and largest, 144, numbers, with no information about the number *k* itself. I am unsure what the Riddler means by “best” estimate, but a posterior distribution on *M* (and *k*) can be certainly be constructed for a prior like *1/k x 1/M²* on *(k,M)*. (Using M² to make sure the posterior mean does exist.) The joint distribution of the order statistics is

which makes the computation of the posterior distribution rather straightforward. Here is the posterior surface (with an unfortunate rendering of an artefactual horizontal line at 237!), showing a concentration near the lower bound M=144. The posterior mode is actually achieved for M=144 and k=7, while the posterior means are (rounded as) M=169 and k=9.

## leave Bayes factors where they once belonged

Posted in Statistics with tags Bayes factors, Bayesian Analysis, Bayesian decision theory, cross validated, prior comparison, prior predictive, prior selection, The Bayesian Choice, The Beatles, using the data twice, xkcd on February 19, 2019 by xi'an**I**n the past weeks I have received and read several papers (and X validated entries)where the Bayes factor is used to compare priors. Which does not look right to me, not on the basis of my general dislike of Bayes factors!, but simply because this seems to clash with the (my?) concept of Bayesian model choice and also because data should not play a role in that situation, from being used to select a *prior*, hence at least twice to run the inference, to resort to a *single* parameter value (namely the one behind the data) to decide between two distributions, to having no asymptotic justification, to eventually favouring the prior concentrated on the maximum likelihood estimator. And more. But I fear that this reticence to test for prior adequacy also extends to the prior predictive, or Box’s p-value, namely the probability under this prior predictive to observe something “more extreme” than the current observation, to quote from David Spiegelhalter.

## a glaring mistake

Posted in Statistics with tags Bayes factor, Bayesian hypothesis testing, Bayesian textbook, cross validated, Stack Exchange, The Bayesian Choice, typos on November 28, 2018 by xi'an**S**omeone posted this question about Bayes factors in my book on Saturday morning and I could not believe the glaring typo pointed out there had gone through the centuries without anyone noticing! There should be no index 0 or 1 on the θ’s in either integral (or indices all over). I presume I made this typo when cutting & pasting from the previous formula (which addressed the case of two point null hypotheses), but I am quite chagrined that I sabotaged the definition of the Bayes factor for generations of readers of the Bayesian Choice. Apologies!!!

## back to the Bayesian Choice

Posted in Books, Kids, Statistics, University life with tags autoregressive model, Bayesian decision theory, Book, exercises, improper posteriors, improper prior, inverse Gamma distribution, prior predictive, The Bayesian Choice on October 17, 2018 by xi'an**S**urprisingly (or not?!), I received two requests about some exercises from The Bayesian Choice, one from a group of students from McGill having difficulties solving the above, wondering about the properness of the posterior (but missing the integration of x), to whom I sent back this correction. And another one from the Czech Republic about a difficulty with the term “evaluation” by which I meant (pardon my French!) estimation.

## about paradoxes

Posted in Books, Kids, Statistics, University life with tags bias, book review, email, Jacobian, Mark Chang, MLE, paradoxes, reparameterisation, scientific inference, The Bayesian Choice, unbiasedness on December 5, 2017 by xi'an**A**n email I received earlier today about statistical paradoxes:

I am a PhD student in biostatistics, and an avid reader of your work. I recently came across this blog post, where you review a text on statistical paradoxes, and I was struck by this section:

I found this section provocative, but I am unclear on the nature of these “paradoxes”. I reviewed my stat inference notes and came across the classic example that there is no unbiased estimator for 1/p w.r.t. a binomial distribution, but I believe you are getting at a much more general result. If it’s not too much trouble, I would sincerely appreciate it if you could point me in the direction of a reference or provide a bit more detail for these two “paradoxes”.

The text is Chang’s Paradoxes in Scientific Inference, which I indeed reviewed negatively. To answer about the bias “paradox”, it is indeed a neglected fact that, while the average of *any* transform of a sample obviously is an unbiased estimator of its mean (!), the converse does not hold, namely, an *arbitrary* transform of the model parameter θ is not necessarily enjoying an unbiased estimator. In Lehmann and Casella, Chapter 2, Section 4, this issue is (just slightly) discussed. But essentially, transforms that lead to unbiased estimators are mostly the polynomial transforms of the mean parameters… (This also somewhat connects to a recent X validated question as to why MLEs are not always unbiased. Although the simplest explanation is that the transform of the MLE is the MLE of the transform!) In exponential families, I would deem the range of transforms with unbiased estimators closely related to the collection of functions that allow for inverse Laplace transforms, although I cannot quote a specific result on this hunch.

The other “paradox” is that, if h(X) is the MLE of the model parameter θ for the observable X, the distribution of h(X) has a density different from the density of X and, hence, its maximisation in the parameter θ may differ. An example (my favourite!) is the MLE of ||a||² based on x N(a,I) which is ||x||², a poor estimate, and which (strongly) differs from the MLE of ||a||² based on ||x||², which is close to (1-p/||x||²)²||x||² and (nearly) admissible [as discussed in the Bayesian Choice].

## relativity is the keyword

Posted in Books, Statistics, University life with tags Bayes factor, model posterior probabilities, OxWaSP, relativity, Saint Giles cemetery, testing of hypotheses, The Bayesian Choice, University of Oxford on February 1, 2017 by xi'an**A**s I was teaching my introduction to Bayesian Statistics this morning, ending up with the chapter on tests of hypotheses, I found reflecting [out loud] on the relative nature of posterior quantities. Just like when I introduced the role of priors in Bayesian analysis the day before, I stressed the relativity of quantities coming out of the BBB [Big Bayesian Black Box], namely that whatever happens as a Bayesian procedure is to be understood, scaled, and relativised against the prior equivalent, i.e., that the reference measure or gauge is the prior. This is sort of obvious, clearly, but bringing the argument forward from the start avoids all sorts of misunderstanding and disagreement, in that it excludes the claims of absolute and certainty that may come with the production of a posterior distribution. It also removes the endless debate about the determination of *the* prior, by making *each* prior a reference on its own. With an additional possibility of calibration by simulation under the assumed model. Or an alternative. Again nothing new there, but I got rather excited by this presentation choice, as it seems to clarify the path to Bayesian modelling and avoid misapprehensions.

Further, the curious case of the Bayes factor (or of the posterior probability) could possibly be resolved most satisfactorily in this framework, as the [dreaded] dependence on the model prior probabilities then becomes a matter of relativity! Those posterior probabilities depend directly and almost linearly on the prior probabilities, but they should not be interpreted in an *absolute* sense as the ultimate and unique probability of the hypothesis (which anyway does not mean anything in terms of the observed experiment). In other words, this posterior probability does not need to be scaled against a U(0,1) distribution. Or against the *p*-value if anyone wishes to do so. By the end of the lecture, I was even wondering [not so loudly] whether or not this perspective was allowing for a resolution of the Lindley-Jeffreys paradox, as the resulting number could be set relative to the choice of the [arbitrary] normalising constant. Continue reading