Terry Speed wrote a column in the latest IMS Bulletin (the one I received a week ago) about the choice of the denominator in the variance estimator. That is, should s² involve n (number of observations), n-1 (degrees of freedom), n+1 or anything else in its denominator? I find the question more interesting than the answer (sorry, Terry!) as it demonstrates quite forcibly that there is not a single possible choice for this estimator of the variance but that instead the “optimal” estimator is determined by the choice of the optimality criterion: this makes for a wonderful (if rather formal) playground for a class on decision theoretic statistics. And I often use it on my students. Non-Bayesian mathematical statistics courses often give the impression that there is a natural (single) estimator, when this estimator is based on an implicit choice of an optimality criterion. (This issue is illustrated in the books of Chang and of Vasishth and Broe I discussed earlier. As well as by the Stein effect, of course.) I thus deem it worthwhile to impress upon all users of statistics that there is no such single optimal choice, that unbiasedness is not a compulsory property—just as well since most parameters cannot be estimated in an unbiased manner!—, and that there is room for a subjective choice of a “best” estimator, as paradoxical as it may sound to non-statisticians.
Archive for unbiasedness
(I received the following set of comments from Mark Chang after publishing a review of his book on the ‘Og. Here they are, verbatim, except for a few editing and spelling changes. It’s a huge post as Chang reproduces all of my comments as well.)
Professor Christian Robert reviewed my book: “Paradoxes in Scientific Inference”. I found that the majority of his criticisms had no foundation and were based on his truncated way of reading. I gave point-by-point responses below. For clarity, I kept his original comments.
Robert’s Comments: This CRC Press book was sent to me for review in CHANCE: Paradoxes in Scientific Inference is written by Mark Chang, vice-president of AMAG Pharmaceuticals. The topic of scientific paradoxes is one of my primary interests and I have learned a lot by looking at Lindley-Jeffreys and Savage-Dickey paradoxes. However, I did not find a renewed sense of excitement when reading the book. The very first (and maybe the best!) paradox with Paradoxes in Scientific Inference is that it is a book from the future! Indeed, its copyright year is 2013 (!), although I got it a few months ago. (Not mentioning here the cover mimicking Escher’s “paradoxical” pictures with dices. A sculpture due to Shigeo Fukuda and apparently not quoted in the book. As I do not want to get into another dice cover polemic, I will abstain from further comments!)
Thank you, Robert for reading and commenting on part of my book. I had the same question on the copyright year being 2013 when it was actually published in previous year. I believe the same thing had happened to my other books too. The incorrect year causes confusion for future citations. The cover was designed by the publisher. They gave me few options and I picked the one with dices. I was told that the publisher has the copyright for the art work. I am not aware of the original artist. Read more »
This afternoon, Jean-Michel Marin gave his talk at the big’MC seminar. As already posted, it was about a convergence proof for AMIS, which gave me the opportunity to simultaneously read the paper and listen to the author. The core idea for adapting AMIS towards a manageable version is to update the proposal parameter based on the current sample rather than on the whole past. This facilitates the task of establishing convergence to the optimal (pseudo-true) value of the parameter, under an assumption that the optimal value is a know moment of the target. From there, convergence of the weighted mean is somehow natural when the number of simulations grows to infinity. (Note the special asymptotics of AMIS, though, which are that the number of steps goes to infinity while the number of simulations per step grows a wee faster than linearly. In this respect, it is the opposite of PMC, where convergence is of a more traditional nature, pushing the number of simulations per step to infinity.) The second part of the convergence proof is more intricate, as it establishes that the multiple mixture estimator based on the “forward-backward” reweighting of all simulations since step zero does converge to the proper posterior moment. This relies on rather complex assumptions, but remains a magnificent tour de force. During the talk, I wondered if, given the Markovian nature of the algorithm (since reweighting only occurs once simulation is over), an alternative estimator based on the optimal value of the simulation parameter would not be better than the original multiple mixture estimator: the proof is based on the equivalence between both versions….
beware, nefarious Bayesians threaten to take over frequentism using loss functions as Trojan horses!Posted in Books, pictures, Statistics with tags admissibility, Aris Spanos, arXiv, decision theory, econometrics, Erich Lehmann, James-Stein estimator, linear model, loss functions, Lucien Le Cam, minimaxity, Stein effect, unbiasedness on November 12, 2012 by xi'an
“It is not a coincidence that textbooks written by Bayesian statisticians extol the virtue of the decision-theoretic perspective and then proceed to present the Bayesian approach as its natural extension.” (p.19)
“According to some Bayesians (see Robert, 2007), the risk function does represent a legitimate frequentist error because it is derived by taking expectations with respect to [the sampling density]. This argument is misleading for several reasons.” (p.18)
During my R exam, I read the recent arXiv posting by Aris Spanos on why “the decision theoretic perspective misrepresents the frequentist viewpoint”. The paper is entitled “Why the Decision Theoretic Perspective Misrepresents Frequentist Inference: ‘Nuts and Bolts’ vs. Learning from Data” and I found it at the very least puzzling…. The main theme is the one caricatured in the title of this post, namely that the decision-theoretic analysis of frequentist procedures is a trick brought by Bayesians to justify their own procedures. The fundamental argument behind this perspective is that decision theory operates in a “for all θ” referential while frequentist inference (in Spanos’ universe) is only concerned by one θ, the true value of the parameter. (Incidentally, the “nuts and bolt” refers to the only case when a decision-theoretic approach is relevant from a frequentist viewpoint, namely in factory quality control sampling.)
“The notions of a risk function and admissibility are inappropriate for frequentist inference because they do not represent legitimate error probabilities.” (p.3)
“An important dimension of frequentist inference that has not been adequately appreciated in the statistics literature concerns its objectives and underlying reasoning.” (p.10)
“The factual nature of frequentist reasoning in estimation also brings out the impertinence of the notion of admissibility stemming from its reliance on the quantifier ‘for all’.” (p.13)
One strange feature of the paper is that Aris Spanos seems to appropriate for himself the notion of frequentism, rejecting the choices made by (what I would call frequentist) pioneers like Wald, Neyman, “Lehmann and LeCam [sic]“, Stein. Apart from Fisher—and the paper is strongly grounded in neo-Fisherian revivalism—, the only frequentists seemingly finding grace in the eyes of the author are George Box, David Cox, and George Tiao. (The references are mostly to textbooks, incidentally.) Modern authors that clearly qualify as frequentists like Bickel, Donoho, Johnstone, or, to mention the French school, e.g., Birgé, Massart, Picard, Tsybakov, none of whom can be suspected of Bayesian inclinations!, do not appear either as satisfying those narrow tenets of frequentism. Furthermore, the concept of frequentist inference is never clearly defined within the paper. As in the above quote, the notion of “legitimate error probabilities” pops up repeatedly (15 times) within the whole manifesto without being explicitely defined. (The closest to a definition is found on page 17, where the significance level and the p-value are found to be legitimate.) Aris Spanos even rejects what I would call the von Mises basis of frequentism: “contrary to Bayesian claims, those error probabilities have nothing to to do with the temporal or the physical dimension of the long-run metaphor associated with repeated samples” (p.17), namely that a statistical procedure cannot be evaluated on its long term performance… Read more »
Here is a question I posted on Stack Exchange a while ago:
In a setting where one observes X1,…,Xn distributed from a distribution with (unknown) density f, I wonder if there is an unbiased estimator (based on the Xi‘s) of the Hellinger distance to another distribution with known density f0, namely
for the Hellinger distance. In addition, this estimator is guaranteed to enjoy a finite variance since
Considering this question again, I am now fairly convinced there cannot be an unbiased estimator of H, as it behaves like a standard deviation for which there usually is no unbiased estimator!