## running out of explanations

Posted in Books, Kids, Statistics with tags , , , , , on September 23, 2015 by xi'an

A few days ago, I answered a self-study question on Cross Validated about the convergence in probability of 1/X given the convergence in probability of X to a. Until I ran out of explanations… I did not see how to detail any further the connection between both properties! The reader (OP) started from a resolution of the corresponding exercise in Casella and Berger’s Statistical Inference and could not follow the steps, some of which were incorrect. But my attempts at making him uncover the necessary steps failed, presumably because he was sticking to this earlier resolution rather than starting from the definition of convergence in probability. And he could not get over the equality

$\mathbb{P}(|a/X_{i} - 1| < \epsilon)=\mathbb{P}\left(a-{{a\epsilon}\over{1 + \epsilon}} < X_{i} < a + {{a\epsilon}\over{1 - \epsilon}}\right)$

which is the central reason why one convergence transfers to the other… I know I know nothing, and even less about pedagogy, but it is (just so mildly!) frustrating to hit a wall beyond which no further explanation can help! Feel free to propose an alternative resolution.

Update: A few days later, readers of Cross Validated pointed out that the question had been answered by whuber in a magisterial way. But I wonder if my original reader appreciated this resolution, since he did not pursue the issue.

## full Bayesian significance test

Posted in Books, Statistics with tags , , , , , , , , , , on December 18, 2014 by xi'an

Among the many comments (thanks!) I received when posting our Testing via mixture estimation paper came the suggestion to relate this approach to the notion of full Bayesian significance test (FBST) developed by (Julio, not Hal) Stern and Pereira, from São Paulo, Brazil. I thus had a look at this alternative and read the Bayesian Analysis paper they published in 2008, as well as a paper recently published in Logic Journal of IGPL. (I could not find what the IGPL stands for.) The central notion in these papers is the e-value, which provides the posterior probability that the posterior density is larger than the largest posterior density over the null set. This definition bothers me, first because the null set has a measure equal to zero under an absolutely continuous prior (BA, p.82). Hence the posterior density is defined in an arbitrary manner over the null set and the maximum is itself arbitrary. (An issue that invalidates my 1993 version of the Lindley-Jeffreys paradox!) And second because it considers the posterior probability of an event that does not exist a priori, being conditional on the data. This sounds in fact quite similar to Statistical Inference, Murray Aitkin’s (2009) book using a posterior distribution of the likelihood function. With the same drawback of using the data twice. And the other issues discussed in our commentary of the book. (As a side-much-on-the-side remark, the authors incidentally  forgot me when citing our 1992 Annals of Statistics paper about decision theory on accuracy estimators..!)

## Nonlinear Time Series just appeared

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 26, 2014 by xi'an

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating both inference in nonlinear time series and algorithms applied to those objects. The later includes MCMC, pMCMC, sequential Monte Carlo, particle filters, and the EM algorithm. While I am too close to the authors to write a balanced review for CHANCE (the book is under review by another researcher, before you ask!), I think this is an important book that reflects the state of the art in the rigorous study of those models. Obviously, the mathematical rigour advocated by the authors makes Nonlinear Time Series a rather advanced book (despite the authors’ reassuring statement that “nothing excessively deep is used”) more adequate for PhD students and researchers than starting graduates (and definitely not advised for self-study), but the availability of the R code (on the highly personal page of David Stoffer) comes to balance the mathematical bent of the book in the first and third parts. A great reference book!

## Do we need…yes we do (with some delay)!

Posted in Books, Statistics, University life with tags , , , , , , on April 4, 2013 by xi'an

## CHANCE: special issue on George Casella’s books

Posted in Books, R, Statistics, University life with tags , , , , , , , on February 10, 2013 by xi'an

The special issue of CHANCE on George Casella’s books has now appeared and it contains both my earlier post on George passing away and  reviews of several of his books, as follows:

Although all of those books have appeared between twenty and five years ago, the reviews are definitely worth reading! (Disclaimer: I am the editor of the Books Review section who contacted friends of George to write the reviews, as well as the co-author of two of those books!) They bring in my (therefore biased) opinion a worthy evaluation of the depths and impacts of those major books, and they also reveal why George was a great teacher, bringing much into the classroom and to his students… (Unless I am confused the whole series of reviews is available to all, and not only to CHANCE subscribers. Thanks, Sam!)

## estimating a constant

Posted in Books, Statistics with tags , , , , , , , , , on October 3, 2012 by xi'an

Paulo (a.k.a., Zen) posted a comment in StackExchange on Larry Wasserman‘s paradox about Bayesians and likelihoodists (or likelihood-wallahs, to quote Basu!) being unable to solve the problem of estimating the normalising constant c of the sample density, f, known up to a constant

$f(x) = c g(x)$

(Example 11.10, page 188, of All of Statistics)

My own comment is that, with all due respect to Larry!, I do not see much appeal in this example, esp. as a potential criticism of Bayesians and likelihood-wallahs…. The constant c is known, being equal to

$1/\int_\mathcal{X} g(x)\text{d}x$

If c is the only “unknown” in the picture, given a sample x1,…,xn, then there is no statistical issue whatsoever about the “problem” and I do not agree with the postulate that there exist estimators of c. Nor priors on c (other than the Dirac mass on the above value). This is not in the least a statistical problem but rather a numerical issue.That the sample x1,…,xn can be (re)used through a (frequentist) density estimate to provide a numerical approximation of c

$\hat c = \hat f(x_0) \big/ g(x_0)$

is a mere curiosity. Not a criticism of alternative statistical approaches: e.g., I could also use a Bayesian density estimate…

Furthermore, the estimate provided by the sample x1,…,xn is not of particular interest since its precision is imposed by the sample size n (and converging at non-parametric rates, which is not a particularly relevant issue!), while I could use importance sampling (or even numerical integration) if I was truly interested in c. I however find the discussion interesting for many reasons

1. it somehow relates to the infamous harmonic mean estimator issue, often discussed on the’Og!;
2. it brings more light on the paradoxical differences between statistics and Monte Carlo methods, in that statistics is usually constrained by the sample while Monte Carlo methods have more freedom in generating samples (up to some budget limits). It does not make sense to speak of estimators in Monte Carlo methods because there is no parameter in the picture, only “unknown” constants. Both fields rely on samples and probability theory, and share many features, but there is nothing like a “best unbiased estimator” in Monte Carlo integration, see the case of the “optimal importance function” leading to a zero variance;
3. in connection with the previous point, the fascinating Bernoulli factory problem is not a statistical problem because it requires an infinite sequence of Bernoullis to operate;
4. the discussion induced Chris Sims to contribute to StackExchange!

## Error and Inference [end]

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 11, 2011 by xi'an

(This is my sixth and last post on Error and Inference, being as previously a raw and naïve reaction born from a linear and sluggish reading of the book, rather than a deeper and more informed criticism with philosophical bearings. Read at your own risk.)

‘It is refreshing to see Cox and Mayo give a hard-nosed statement of what scientific objectivity demands of an account of statistics, show how it relates to frequentist statistics, and contrast that with the notion of “objectivity” used by O-Bayesians.”—A. Spanos, p.326, Error and Inference, 2010

In order to conclude my pedestrian traverse of Error and Inference, I read the discussion by Aris Spanos of the second part of the seventh chapter by David Cox’s and Deborah Mayo’s, discussed in the previous post. (In the train to the half-marathon to be precise, which may have added a sharper edge to the way I read it!) The first point in the discussion is that the above paper is “a harmonious blend of the Fisherian and N-P perspectives to weave a coherent frequentist inductive reasoning anchored firmly on error probabilities”(p.316). The discussion by Spanos is very much a-critical of the paper, so I will not engage into a criticism of the non-criticism, but rather expose some thoughts of mine that came from reading this apology. (Remarks about Bayesian inference are limited to some piques like the above, which only reiterates those found earlier [and later: “the various examples Bayesians employ to make their case involve some kind of “rigging” of the statistical model“, Aris Spanos, p.325; “The Bayesian epistemology literature is filled with shadows and illusions“, Clark Glymour, p. 335] in the book.) [I must add I do like the mention of O-Bayesians, as I coined the O’Bayes motto for the objective Bayes bi-annual meetings from 2003 onwards! It also reminds me of the O-rings and of the lack of proper statistical decision-making in the Challenger tragedy…]

The “general frequentist principle for inductive reasoning” (p.319) at the core of Cox and Mayo’s paper is obviously the central role of the p-value in “providing (strong) evidence against the null H0 (for a discrepancy from H0)”. Once again, I fail to see it as the epitome of a working principle in that

1. it depends on the choice of a divergence d(z), which reduces the information brought by the data z;
2. it does not articulate the level for labeling nor the consequences of finding a low p-value;
3. it ignores the role of the alternative hypothesis.

Furthermore, Spanos’ discussion deals with “the fallacy of rejection” (pp.319-320) in a rather artificial (if common) way, namely by setting a buffer of discrepancy γ around the null hypothesis. While the choice of a maximal degree of precision sounds natural to me (in the sense that a given sample size should not allow for the discrimination between two arbitrary close values of the parameter), the fact that γ is in fine set by the data (so that the p-value is high) is fairly puzzling. If I understand correctly, the change from a p-value to a discrepancy γ is a fine device to make the “distance” from the null better understood, but it has an extremely limited range of application. If I do not understand correctly, the discrepancy γ is fixed by the statistician and then this sounds like an extreme form of prior selection.

There is at least one issue I do not understand in this part, namely the meaning of the severity evaluation probability

$P(d(Z) > d(z_0);\,\mu> \mu_1)$

as the conditioning on the event seems impossible in a frequentist setting. This leads me to an idle and unrelated questioning as to whether there is a solution to

$\sup_d \mathbb{P}_{H_0}(d(Z) \ge d(z_0))$

as this would be the ultimate discrepancy. Or whether this does not make any sense… because of the ambiguous role of z0, which needs somehow to be integrated out. (Otherwise, d can be chosen so that the probability is 1.)

“If one renounces the likelihood, the stopping rule, and the coherence principles, marginalizes the use of prior information as largely untrustworthy, and seek procedures with `good’ error probabilistic properties (whatever that means), what is left to render the inference Bayesian, apart from a belief (misguided in my view) that the only way to provide an evidential account of inference is to attach probabilities to hypotheses?”—A. Spanos, p.326, Error and Inference, 2010

The role of conditioning ancillary statistics is emphasized both in the paper and the discussion. This conditioning clearly reduces variability, however there is no reservation about the arbitrariness of such ancillary statistics. And the fact that conditioning any further would lead to conditioning upon the whole data, i.e. to a Bayesian solution. I also noted a curious lack of proper logical reasoning in the argument that, when

$f(z|\theta) \propto f(z|s) f(s|\theta),$

using the conditional ancillary distribution is enough, since, while “any departure from f(z|s) implies that the overall model is false” (p.322), but not the reverse. Hence, a poor choice of s may fail to detect a departure. (Besides the fact that  fixed-dimension sufficient statistics do not exist outside exponential families.) Similarly, Spanos expands about the case of a minimal sufficient statistic that is independent from a maximal ancillary statistic, but such cases are quite rare and limited to exponential families [in the iid case]. Still in the conditioning category, he also supports Mayo’s argument against the likelihood principle being a consequence of the sufficiency and weak conditionality principles. A point I discussed in a previous post. However, he does not provide further evidence against Birnbaum’s result, arguing rather in favour of a conditional frequentist inference I have nothing to complain about. (I fail to perceive the appeal of the Welch uniform example in terms of the likelihood principle.)

In an overall conclusion, let me repeat and restate that this series of posts about Error and Inference is far from pretending at bringing a Bayesian reply to the philosophical arguments raised in the volume. The primary goal being of “taking some crucial steps towards legitimating the philosophy of frequentist statistics” (p.328), I should not feel overly concerned. It is only when the debate veered towards a comparison with the Bayesian approach [often too often of the “holier than thou” brand] that I felt allowed to put in my twopennies worth… I do hope I may crystallise this set of notes into a more constructed review of the book, if time allows, although I am pessimistic at the chances of getting it published given our current difficulties with the critical review of Murray Aitkin’s  Statistical Inference. However, as a coincidence, we got back last weekend an encouraging reply from Statistics and Risk Modelling, prompting us towards a revision and the prospect of a reply by Murray.