## arXives

Posted in Books, Statistics with tags , , , , , , on March 31, 2010 by xi'an

Yesterday, I finally arXived my notes on Keynes’ book A Treatise On Probability, but, due to the new way the arXiv website operates, there is no indication of the page associated with the submitted paper before it gets accepted and I cannot thus prepare an Og’ entry until this acceptance, wasting perfect timing! Anyway, this is the first draft of the notes and it has not yet been submitted to a journal. As the new user interface on the arXiv webpage now displays all past papers, I added a category on our 2007 Annals paper with Randal Douc, Arnaud Guillin and Jean-Michel Marin, which means it appeared again in today’s list…

Today I completed my revision of the review of Burdzy’s The Search for Certainty over for Bayesian Analysis, so the new version will be on arXiv tomorrow morning. The changes are really minor as Bayesian Analysis mostly requested smoothing down my criticisms. I also added a few more quotes and some sentences in the conclusion. I wonder if this paper will appear with a discussion, since three are already written!

At last, let me point out three recent interesting postings on arXiv if I do not have time to discuss them more in depth, one by Peter Green on Colouring and breaking sticks: random distributions and heterogeneous clustering, one by Nicolas Chopin, Tony Lelièvre et Gabriel Stolz on Free energy methods for efficient exploration of mixture posterior densities, and one by Sophie Donnet and Jean-Michel Marin on An empirical Bayes procedure for the selection of Gaussian graphical models.

## Keynes and the Society for imprecise probability

Posted in Books, Statistics with tags , , on March 30, 2010 by xi'an

When completing my comments on Keynes’ A Treatise On Probability, thanks to an Og’s reader, I found that Keynes is held in high esteem (as a probabilist) by the members of the Society for Imprecise Probability. The goals of the society are set as

The Society for Imprecise Probability: Theories and Applications (SIPTA) was created in February 2002, with the aim of promoting the research on imprecise probability. This is done through a series of activities for bringing together researchers from different groups, creating resources for information, dissemination and documentation, and making other people aware of the potential of imprecise probability models.

The Society has its roots in the Imprecise Probabilities Project conceived in 1996 by Peter Walley and Gert de Cooman and its creation has been encouraged by the success of the ISIPTA conferences.

Imprecise probability is understood in a very wide sense. It is used as a generic term to cover all mathematical models which measure chance or uncertainty without sharp numerical probabilities. It includes both qualitative (comparative probability, partial preference orderings, …) and quantitative modes (interval probabilities, belief functions, upper and lower previsions, …). Imprecise probability models are needed in inference problems where the relevant information is scarce, vague or conflicting, and in decision problems where preferences may also be incomplete.

The society sees J.M. Keynes as a precursor of the Dempster-Schafer perspective on probability, whose Bayesian version is represented in Peter Walley’s book, Statistical Reasoning with Imprecise Probabilities, due to the mention in Keynes’ A Treatise On Probability thanks to the remark made by Keynes (Chapter XV) that “many probabilities can be placed between numerical limits”. Given that the book does not extrapolate on how to take advantage of this generalisation of probabilities, but instead sees it as an impediment to probabilise the parameter space, I would think this remark is more representative of the general confusion made between true (i.e. model related) probabilities and their (observation based) estimates.

## Keynes’ derivations

Posted in Books, Statistics with tags , , , , , on March 29, 2010 by xi'an

Chapter XVII of Keynes’ A Treatise On Probability contains Keynes’ most noteworthy contribution to Statistics, namely the classification of probability distributions such that the arithmetic/geometric/harmonic empirical mean/empirical median is also the maximum likelihood estimator. This problem was first stated by Laplace and Gauss (leading to Laplace distribution in connection with the median and to the Gaussian distribution for the arithmetic mean). The derivation of the densities $f(x,\theta)$ of those probability distributions is based on the constraint the likelihood equation

$\sum_{i=1}^n \dfrac{\partial}{\partial\theta}\log f(y_i,\theta) = 0$

is satisfied for one of the four empirical estimate, using differential calculus (despite the fact that Keynes earlier derived Bayes’ theorem by assuming the parameter space to be discrete). Under regularity assumptions, in the case of the arithmetic mean, my colleague Eric Séré showed me this indeed leads to the family of distributions

$f(x,\theta) = \exp\left\{ \phi^\prime(\theta) (x-\theta) - \phi(\theta) + \psi(x) \right\}\,,$

where $\phi$ and $\psi$ are almost arbitrary functions under the constraints that $\phi$ is twice differentiable and $f(x,\theta)$ is a density in $x$. This means that $\phi$ satisfies

$\phi(\theta) = \log \int \exp \left\{ \phi^\prime(\theta) (x-\theta) + \psi(x)\right\}\, \text{d}x\,,$

a constraint missed by Keynes.

While I cannot judge of the level of novelty in Keynes’ derivation with respect to earlier works, this derivation therefore produces a generic form of unidimensional exponential family, twenty-five years before their rederivation by Darmois (1935), Pitman (1936) and Koopman (1936) as characterising distributions with sufficient statistics of constant dimensions. The derivation of the distributions for which the geometric or the harmonic means are MLEs then follows by a change of variables, $y=\log x,\,\lambda=\log \theta$ or $y=1/x,\,\lambda=1/\theta$, respectively. In those different derivations, the normalisation issue is treated quite off-handedly by Keynes, witness the function

$f(x,\theta) = A \left( \dfrac{\theta}{x} \right)^{k\theta} e^{-k\theta}$

at the bottom of page 198, which is not integrable in $x$ unless its support is bounded away from 0 or $\infty$. Similarly, the derivation of the log-normal density on page 199 is missing the Jacobian factor $1/x$ (or $1/y_q$ in Keynes’ notations) and the same problem arises for the inverse-normal density, which should be

$f(x,\theta) = A e^{-k^2(x-\theta)^2/\theta^2 x^2} \dfrac{1}{x^2}\,,$

instead of $A\exp k^2(\theta-x)^2/x$ (page 200). At last, I find the derivation of the distributions linked with the median rather dubious since Keynes’ general solution

$f(x,\theta) = A \exp \left\{ \displaystyle{\int \dfrac{y-\theta}{|y-\theta|}\,\phi^{\prime\prime}(\theta)\,\text{d}\theta +\psi(x) }\right\}$

(where the integral ought to be interpreted as a primitive) is such that the recovery of Laplace’s distribution, $f(x,\theta)\propto \exp-k^2|x-\theta|$ involves setting (page 201)

$\psi(x) = \dfrac{\theta-x}{|x-\theta|}\,k^2 x\,,$

hence making $\psi$ a function of $\theta$ as well. The summary two pages later actually produces an alternative generic form, namely

$f(x,\theta) = A \exp\left\{ \phi^\prime(\theta)\dfrac{x-\theta}{|x-\theta|}+\psi(x) \right\}\,,$

with the difficulties that the distribution only vaguely depends on $\theta$, being then a step function times $exp(\psi(x))$ and that, unless $\phi$ is properly calibrated, $A$ also depends on $\theta$.

Given that this part is the most technical section of the book, this post shows why I am fairly disappointed at having picked this book for my reading seminar. There is no further section with innovative methodological substance in the remainder of the book, which now appears to me as no better than a graduate dissertation on the probabilistic and statistical literature of the (not that) late 19th century, modulo the (inappropriate) highly critical tone.

## Incoherent inference

Posted in Statistics, University life with tags , , , , , , on March 28, 2010 by xi'an

“The probability of the nested special case must be less than or equal to the probability of the general model within which the special case is nested. Any statistic that assigns greater probability to the special case is incoherent. An example of incoherence is shown for the ABC method.” Alan Templeton, PNAS, 2010

Alan Templeton just published an article in PNAS about “coherent and incoherent inference” (with applications to phylogeography and human evolution). While his (misguided) arguments are mostly those found in an earlier paper of his’ and discussed in this post as well as in the defence of model based inference twenty-two of us published in Molecular Ecology a few months ago, the paper contains a more general critical perspective on Bayesian model comparison, aligning argument after argument about the incoherence of the Bayesian approach (and not of ABC, as presented there). The notion of coherence is borrowed from the 1991 (Bayesian) paper of Lavine and Schervish on Bayes factors, which shows that Bayes factors may be nonmonotonous in the alternative hypothesis (but also that posterior probabilities aren’t!). Templeton’s first argument proceeds from the quote above, namely that larger models should have larger probabilities or else this violates logic and coherence! The author presents the reader with a Venn diagram to explain why a larger set should have a larger measure. Obviously, he does not account for the fact that in model choice, different models induce different parameters spaces and that those spaces are endowed with orthogonal measures, especially if the spaces are of different dimensions. In the larger space, $P(\theta_1=0)=0$. (This point is not even touching the issue of defining “the” probability over the collection of models that Templeton seems to take for granted but that does not make sense outside a Bayesian framework.) Talking therefore of nested models having a smaller probability than the encompassing model or of “partially overlapping models” does not make sense from a measure theoretic (hence mathematical) perspective. (The fifty-one occurences of coherent/incoherent in the paper do not bring additional weight to the argument!)

“Approximate Bayesian computation (ABC) is presented as allowing statistical comparisons among models. ABC assigns posterior probabilities to a finite set of simulated a priori models.” Alan Templeton, PNAS, 2010

An issue common to all recent criticisms by Templeton is the misleading or misled confusion between the ABC method and the resulting Bayesian inference. For instance, Templeton distinguishes between the incoherence in the ABC model choice procedure from the incoherence in the Bayes factor, when ABC is used as a computational device to approximate the Bayes factor. There is therefore no inferential aspect linked with ABC,  per se, it is simply a numerical tool to approximate Bayesian procedures and, with enough computer power, the approximation can get as precise as one wishes. In this paper, Templeton also reiterates the earlier criticism that marginal likelihoods are not comparable across models, because they “are not adjusted for the dimensionality of the data or the models” (sic!). This point is missing the whole purpose of using marginal likelihoods, namely that they account for the dimensionality of the parameter by providing a natural Ockham’s razor penalising the larger model without requiring to specify a penalty term. (If necessary, BIC is so successful! provides an approximation to this penalty, as well as the alternate DIC.) The second criticism of ABC (i.e. of the Bayesian approach) is that model choice requires a collection of models and cannot decide outside this collection. This is indeed the purpose of a Bayesian model choice and studies like Berger and Sellke (1987, JASA) have shown the difficulty of reasoning within a single model. Furthermore, Templeton advocates the use of a likelihood ratio test, which necessarily implies using two models. Another Venn diagram also explains why Bayes formula when used for model choice is “mathematically and logically incorrect” because marginal likelihoods cannot be added up when models “overlap”: according to him, “there can be no universal denominator, because a simple sum always violates the constraints of logic when logically overlapping models are tested“. Once more, this simply shows a poor understanding of the probabilistic modelling involved in model choice.

“The central equation of ABC is inherently incoherent for three separate reasons, two of which are applicable in every case that deals with overlapping hypotheses.” Alan Templeton, PNAS, 2010

This argument relies on the representation of the “ABC equation” (sic!)

$P(H_i|H,S^*) = \dfrac{G_i(||S_i-S^*||) \Pi_i}{\sum_{j=1}^n G_j(||S_j-S^*||) \Pi_j}$

where $S^*$ is the observed summary statistic, $S_i$ is “the vector of expected (simulated) summary statistics under model $i$” and “$G_i$ is a goodness-of-fit measure“. Templeton states that this “fundamental equation is mathematically incorrect in every instance (..) of overlap.” This representation of the ABC approximation is again misleading or misled in that the simulation algorithm ABC produces an approximation to a posterior sample from $\pi_i(\theta_i|S^*)$. The resulting approximation to the marginal likelihood under model $M_i$ is a regular Monte Carlo step that replaces an integral with a weighted sum, not a “goodness-of-fit measure.”  The subsequent argument  of Templeton’s about the goodness-of-fit measures being “not adjusted for the dimensionality of the data” (re-sic!) and the resulting incoherence is therefore void of substance. The following argument repeats an earlier misunderstanding with the probabilistic model involved in Bayesian model choice: the reasoning that, if

$\sum_j \Pi_j = 1$

the constraints of logic are violated [and] the prior probabilities used in the very first step of their Bayesian analysis are incoherent“, does not assimilate the issue of measures over mutually exclusive spaces.

“ABC is used for parameter estimation in addition to hypothesis testing and another source of incoherence is suggested from the internal discrepancy between the posterior probabilities generated by ABC and the parameter estimates found by ABC.” Alan Templeton, PNAS, 2010

The point corresponding to the above quote is that, while the posterior probability that $\theta_1=0$ (model $M_1$) is much higher than the posterior probability of the opposite (model $M_2$), the Bayes estimate of $\theta_1$ under model $M_2$ is “significantly different from zero“. Again, this reflects both a misunderstanding of the probability model, namely that $\theta_1=0$ is impossible [has measure zero] under model $M_2$, and a confusion between confidence intervals (that are model specific) and posterior probabilities (that work across models). The concluding message that “ABC is a deeply flawed Bayesian procedure in which ignorance overwhelms data to create massive incoherence” is thus unsubstantiated.

“Incoherent methods, such as ABC, Bayes factor, or any simulation approach that treats all hypotheses as mutually exclusive, should never be used with logically overlapping hypotheses.” Alan Templeton, PNAS, 2010

In conclusion, I am quite surprised at this controversial piece of work being published in PNAS, as the mathematical and statistical arguments of Professor Templeton should have been assessed by referees who are mathematicians and statisticians, in which case they would have spotted the obvious inconsistencies!

## La Voie du Rige

Posted in Books, Kids with tags , , , on March 27, 2010 by xi'an

I have just received from amazon the last tome of the series La quête de l’oiseau du temps written by Serge Le Tendre and drawn by Régis Loisel. (The series has been partly translated in English as Roxanna and the quest for the time-bird.) I have loved this series since the first cycle was published, about twenty years ago, and the second cycle (which takes place thirty to forty years before) is even better! The quality of the drawings by Loisel is superb, with a clear mastering of colour and shade, and the fantasy plot is deep enough to keep the series more than attractive. The end of the first cycle was highly climactic and the second cycle contains enough threads and first rate characters to be gripping despite the slow publication rate (the first volume appeared in 2004 and the second one in 2007, as advertised below).

The story is slightly less innovative/informative in the third tome than in the two previous ones, with less of a “global picture” , but the central character of “Le Rige” only appears after a tense expectation and the final redemption words (“do you want to become my pupil?”) are quite surprising.

## Shiner

Posted in Travel, Wines with tags , , on March 26, 2010 by xi'an

Not that I want to start a Bier category, but this Texan bier tasted during the Frontiers of Statistical Decision Making and Bayesian Analysis conference was quite pleasant, besides enjoying a cool label!

## More on Treatise

Posted in Books, Statistics with tags , , , on March 25, 2010 by xi'an

When writing my review of Keynes’ A Treatise On Probability, I found that there is a very detailed review paper by John Aldrich (2008) that covers the beginnings of Keynes as a statistician, entitled “Keynes among the Statisticians” (sic!) and published in the journal History of Political Economy. This review is incredibly helpful in resetting the book in the conditions at the time. I also discovered through this review that Harold Jeffreys reviewed A Treatise On Probability in Nature and that Dennis Lindley wrote an Encyclopedia entry on Keynes, no less…