Archive for The Bayesian Choice

about paradoxes

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on December 5, 2017 by xi'an

An email I received earlier today about statistical paradoxes:

I am a PhD student in biostatistics, and an avid reader of your work. I recently came across this blog post, where you review a text on statistical paradoxes, and I was struck by this section:

“For instance, the author considers the MLE being biased to be a paradox (p.117), while omitting the much more substantial “paradox” of the non-existence of unbiased estimators of most parameters—which simply means unbiasedness is irrelevant. Or the other even more puzzling “paradox” that the secondary MLE derived from the likelihood associated with the distribution of a primary MLE may differ from the primary. (My favourite!)”

I found this section provocative, but I am unclear on the nature of these “paradoxes”. I reviewed my stat inference notes and came across the classic example that there is no unbiased estimator for 1/p w.r.t. a binomial distribution, but I believe you are getting at a much more general result. If it’s not too much trouble, I would sincerely appreciate it if you could point me in the direction of a reference or provide a bit more detail for these two “paradoxes”.

The text is Chang’s Paradoxes in Scientific Inference, which I indeed reviewed negatively. To answer about the bias “paradox”, it is indeed a neglected fact that, while the average of any transform of a sample obviously is an unbiased estimator of its mean (!), the converse does not hold, namely, an arbitrary transform of the model parameter θ is not necessarily enjoying an unbiased estimator. In Lehmann and Casella, Chapter 2, Section 4, this issue is (just slightly) discussed. But essentially, transforms that lead to unbiased estimators are mostly the polynomial transforms of the mean parameters… (This also somewhat connects to a recent X validated question as to why MLEs are not always unbiased. Although the simplest explanation is that the transform of the MLE is the MLE of the transform!) In exponential families, I would deem the range of transforms with unbiased estimators closely related to the collection of functions that allow for inverse Laplace transforms, although I cannot quote a specific result on this hunch.

The other “paradox” is that, if h(X) is the MLE of the model parameter θ for the observable X, the distribution of h(X) has a density different from the density of X and, hence, its maximisation in the parameter θ may differ. An example (my favourite!) is the MLE of ||a||² based on x N(a,I) which is ||x||², a poor estimate, and which (strongly) differs from the MLE of ||a||² based on ||x||², which is close to (1-p/||x||²)²||x||² and (nearly) admissible [as discussed in the Bayesian Choice].

relativity is the keyword

Posted in Books, Statistics, University life with tags , , , , , , , on February 1, 2017 by xi'an

St John's College, Oxford, Feb. 23, 2012As I was teaching my introduction to Bayesian Statistics this morning, ending up with the chapter on tests of hypotheses, I found reflecting [out loud] on the relative nature of posterior quantities. Just like when I introduced the role of priors in Bayesian analysis the day before, I stressed the relativity of quantities coming out of the BBB [Big Bayesian Black Box], namely that whatever happens as a Bayesian procedure is to be understood, scaled, and relativised against the prior equivalent, i.e., that the reference measure or gauge is the prior. This is sort of obvious, clearly, but bringing the argument forward from the start avoids all sorts of misunderstanding and disagreement, in that it excludes the claims of absolute and certainty that may come with the production of a posterior distribution. It also removes the endless debate about the determination of the prior, by making each prior a reference on its own. With an additional possibility of calibration by simulation under the assumed model. Or an alternative. Again nothing new there, but I got rather excited by this presentation choice, as it seems to clarify the path to Bayesian modelling and avoid misapprehensions.

Further, the curious case of the Bayes factor (or of the posterior probability) could possibly be resolved most satisfactorily in this framework, as the [dreaded] dependence on the model prior probabilities then becomes a matter of relativity! Those posterior probabilities depend directly and almost linearly on the prior probabilities, but they should not be interpreted in an absolute sense as the ultimate and unique probability of the hypothesis (which anyway does not mean anything in terms of the observed experiment). In other words, this posterior probability does not need to be scaled against a U(0,1) distribution. Or against the p-value if anyone wishes to do so. By the end of the lecture, I was even wondering [not so loudly] whether or not this perspective was allowing for a resolution of the Lindley-Jeffreys paradox, as the resulting number could be set relative to the choice of the [arbitrary] normalising constant. Continue reading

back in Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on January 30, 2017 by xi'an

As in the previous years, I am back in Oxford (England) for my short Bayesian Statistics course in the joint Oxford-Warwick PhD programme, OxWaSP.  For some unclear reason, presumably related to the Internet connection from Oxford, I have not been able to upload my slides to Slideshare, so here the [99.9% identical] older version:

same data – different models – different answers

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on June 1, 2016 by xi'an

An interesting question from a reader of the Bayesian Choice came out on X validated last week. It was about Laplace’s succession rule, which I found somewhat over-used, but it was nonetheless interesting because the question was about the discrepancy of the “non-informative” answers derived from two models applied to the data: an Hypergeometric distribution in the Bayesian Choice and a Binomial on Wikipedia. The originator of the question had trouble with the difference between those two “non-informative” answers as she or he believed that there was a single non-informative principle that should lead to a unique answer. This does not hold, even when following a reference prior principle like Jeffreys’ invariant rule or Jaynes’ maximum entropy tenets. For instance, the Jeffreys priors associated with a Binomial and a Negative Binomial distributions differ. And even less when considering that  there is no unity in reaching those reference priors. (Not even mentioning the issue of the reference dominating measure for the definition of the entropy.) This led to an informative debate, which is the point of X validated.

On a completely unrelated topic, the survey ship looking for the black boxes of the crashed EgyptAir plane is called the Laplace.

off to Oxford

Posted in Kids, pictures, Travel, University life with tags , , , , , , , on January 31, 2016 by xi'an

Oxford, Feb. 23, 2012I am off to Oxford this evening for teaching once again in the Bayesian module of the OxWaSP programme. Joint PhD programme between Oxford and Warwick, supported by the EPSRC. And with around a dozen new [excellent!] PhD students every year. Here are the slides of a longer course that I will use in the coming days:

And by popular request (!) here is the heading of my Beamer file:

% Rather be using my own color
\setbeamercolor{alerted text}{fg=lightred}