Archive for Akaike’s criterion

Naturally amazed at non-identifiability

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on May 27, 2020 by xi'an

A Nature paper by Stilianos Louca and Matthew W. Pennell,  Extant time trees are consistent with a myriad of diversification histories, comes to the extraordinary conclusion that birth-&-death evolutionary models cannot distinguish between several scenarios given the available data! Namely, stem ages and daughter lineage ages cannot identify the speciation rate function λ(.), the extinction rate function μ(.)  and the sampling fraction ρ inherently defining the deterministic ODE leading to the number of species predicted at any point τ in time, N(τ). The Nature paper does not seem to make a point beyond the obvious and I am rather perplexed at why it got published [and even highlighted]. A while ago, under the leadership of Steve, PNAS decided to include statistician reviewers for papers relying on statistical arguments. It could time for Nature to move there as well.

“We thus conclude that two birth-death models are congruent if and only if they have the same rp and the same λp at some time point in the present or past.” [S.1.1, p.4]

Or, stated otherwise, that a tree structured dataset made of branch lengths are not enough to identify two functions that parameterise the model. The likelihood looks like

\frac{\rho^{n-1}\Psi(\tau_1,\tau_0)}{1-E(\tau)}\prod_{i=1}^n \lambda(\tau_i)\Psi(s_{i,1},\tau_i)\Psi(s_{i,2},\tau_i)$

where E(.) is the probability to survive to the present and ψ(s,t) the probability to survive and be sampled between times s and t. Sort of. Both functions depending on functions λ(.) and  μ(.). (When the stem age is unknown, the likelihood changes a wee bit, but with no changes in the qualitative conclusions. Another way to write this likelihood is in term of the speciation rate λp


where Λp is the integrated rate, but which shares the same characteristic of being unable to identify the functions λ(.) and μ(.). While this sounds quite obvious the paper (or rather the supplementary material) goes into fairly extensive mode, including “abstract” algebra to define congruence.


“…we explain why model selection methods based on parsimony or “Occam’s razor”, such as the Akaike Information Criterion and the Bayesian Information Criterion that penalize excessive parameters, generally cannot resolve the identifiability issue…” [S.2, p15]

As illustrated by the above quote, the supplementary material also includes a section about statistical model selections techniques failing to capture the issue, section that seems superfluous or even absurd once the fact that the likelihood is constant across a congruence class has been stated.

An objective prior that unifies objective Bayes and information-based inference

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on June 8, 2015 by xi'an

vale9During the Valencia O’Bayes 2015 meeting, Colin LaMont and Paul Wiggins arxived a paper entitled “An objective prior that unifies objective Bayes and information-based inference”. It would have been interesting to have the authors in Valencia, as they make bold claims about their w-prior as being uniformly and maximally uninformative. Plus achieving this unification advertised in the title of the paper. Meaning that the free energy (log transform of the inverse evidence) is the Akaike information criterion.

The paper starts by defining a true prior distribution (presumably in analogy with the true value of the parameter?) and generalised posterior distributions as associated with any arbitrary prior. (Some notations are imprecise, check (3) with the wrong denominator or the predictivity that is supposed to cover N new observations on p.2…) It then introduces a discretisation by considering all models within a certain Kullback divergence δ to be undistinguishable. (A definition that does not account for the assymmetry of the Kullback divergence.) From there, it most surprisingly [given the above discretisation] derives a density on the whole parameter space

\pi(\theta) \propto \text{det} I(\theta)^{1/2} (N/2\pi \delta)^{K/2}

where N is the number of observations and K the dimension of θ. Dimension which may vary. The dependence on N of the above is a result of using the predictive on N points instead of one. The w-prior is however defined differently: “as the density of indistinguishable models such that the multiplicity is unity for all true models”. Where the log transform of the multiplicity is the expected log marginal likelihood minus the expected log predictive [all expectations under the sampling distributions, conditional on θ]. Rather puzzling in that it involves the “true” value of the parameter—another notational imprecision, since it has to hold for all θ’s—as well as possibly improper priors. When the prior is improper, the log-multiplicity is a difference of two terms such that the first term depends on the constant used with the improper prior, while the second one does not…  Unless the multiplicity constraint also determines the normalising constant?! But this does not seem to be the case when considering the following section on normalising the w-prior. Mentioning a “cutoff” for the integration that seems to pop out of nowhere. Curiouser and curiouser. Due to this unclear handling of infinite mass priors, and since the claimed properties of uniform and maximal uninformativeness are not established in any formal way, and since the existence of a non-asymptotic solution to the multiplicity equation is neither demonstrated, I quickly lost interest in the paper. Which does not contain any worked out example. Read at your own risk!

reading classics (#4,5,6)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , on December 9, 2013 by xi'an

La Défense from Paris-Dauphine, Nov. 15, 2012This week, thanks to a lack of clear instructions (from me) to my students in the Reading Classics student seminar, four students showed up with a presentation! Since I had planned for two teaching blocks, three of them managed to fit within the three hours, while the last one nicely accepted to wait till next week to present a paper by David Cox…

The first paper discussed therein was A new look at the statistical model identification, written in 1974 by Hirotugu Akaike. And presenting the AIC criterion. My student Rozan asked to give the presentation in French as he struggled with English, but it was still a challenge for him and he ended up being too close to the paper to provide a proper perspective on why AIC is written the way it is and why it is (potentially) relevant for model selection. And why it is not such a definitive answer to the model selection problem. This is not the simplest paper in the list, to be sure, but some intuition could have been built from the linear model, rather than producing the case of an ARMA(p,q) model without much explanation. (I actually wonder why the penalty for this model is (p+q)/T, rather than (p+q+1)/T for the additional variance parameter.) Or simulation ran on the performances of AIC versus other xIC’s…

The second paper was another classic, the original GLM paper by John Nelder and his coauthor Wedderburn, published in 1972 in Series B. A slightly easier paper, in that the notion of a generalised linear model is presented therein, with mathematical properties linking the (conditional) mean of the observation with the parameters and several examples that could be discussed. Plus having the book as a backup. My student Ysé did a reasonable job in presenting the concepts, but she would have benefited from this extra-week in including properly the computations she ran in R around the glm() function… (The definition of the deviance was somehow deficient, although this led to a small discussion during the class as to how the analysis of deviance was extending the then flourishing analysis of variance.) In the generic definition of the generalised linear models, I was also reminded of the
generality of the nuisance parameter modelling, which made the part of interest appear as an exponential shift on the original (nuisance) density.

The third paper, presented by Bong, was yet another classic, namely the FDR paper, Controlling the false discovery rate, of Benjamini and Hochberg in Series B (which was recently promoted to the should-have-been-a-Read-Paper category by the RSS Research Committee and discussed at the Annual RSS Conference in Edinburgh four years ago, as well as published in Series B). This 2010 discussion would actually have been a good start to discuss the paper in class, but Bong was not aware of it and mentioned earlier papers extending the 1995 classic. She gave a decent presentation of the problem and of the solution of Benjamini and Hochberg but I wonder how much of the novelty of the concept the class grasped. (I presume everyone was getting tired by then as I was the only one asking questions.) The slides somewhat made it look too much like a simulation experiment… (Unsurprisingly, the presentation did not include any Bayesian perspective on the approach, even though they are quite natural and emerged very quickly once the paper was published. I remember for instance the Valencia 7 meeting in Teneriffe where Larry Wasserman discussed about the Bayesian-frequentist agreement in multiple testing.)

Evidence and evolution (2)

Posted in Books, Statistics with tags , , , , , , , , , , , on April 9, 2010 by xi'an

“When dealing with natural things we will, then, never derive any explanations from the purpose which God or nature may have had in view when creating them and we shall entirely banish from our philosophy the search for final causes. For we should not be so arrogant as to suppose that we can share in God’s plans.” René Descartes, Les Principes de la Philosophie, Livre I, 28

I have now read the second chapter of the book Evidence and Evolution: The Logic Behind the Science by Elliott Sober. The very chapter which title is “Intelligent design”… As posted earlier, I was loath to get into this chapter for fear of being dragged into a nonsensical debate. In fact, the chapter is written from a purely philosophical/logical perspective, while I was looking for statistical arguments given the tenor of the first chapter (reviewing the differences between Bayesians, likelihoodists (sic!), and frequentists). There is therefore very little I can contribute to the debate, being no philosopher of science. I find the introduction of the chapter interesting in that it relates the creationism /”intelligent design” thesis to a long philosophical tradition (witness the above quote from Descartes) rather than to the current political debate about “teaching” creationism in US and UK schools. The disputation of older theses like Paley’s watch is however taking most of the chapter which is disappointing in my humble opinion. In a sense, Sober mostly states the obvious when arguing that when gods or other supernatural beings enter the picture, they can explain for any observed fact with the highest likelihood while being unable to predict any fact not yet observed. I would have prefered to see hard scientific facts and the use of statistical evidence, even of the AIC sort! The call to Popper’s testability does not bring further arguments because Sober also defends the thesis that even the theory of “intelligent” design is falsifiable… In Section 2.19 about model selection, the comparison between a single parameter model and a one million parameter model hints at Ockham’s razor, but Sober misses the point about a  major aspect of Bayesian analysis, which is that by the virtue of hyperpriors and hyperparameters, observations about one group of parameters also brings information about another group of parameters when those are related via a hyperprior (as in small area estimation). Given that the author never discusses the use of priors over the model parameters and uses instead pluggin estimates, he does not take advantage of the marginal posterior dependence between the different groups of parameters.

Evidence and evolution

Posted in Books, Statistics with tags , , , , , , , on April 1, 2010 by xi'an

I have received the book Evidence and Evolution: The Logic Behind the Science by Elliott Sober to review. The book is written by a philosopher of science who has worked on the notion of evidence, in the statistical meaning of the word. I am currently reading the first chapter which is fairly well written and which presents a reasonable picture on the different perspectives (Bayesian, likelihood, frequentist) used for hypothesis testing and model choice. Akaike’s information criterion is a wee too much promoted but that’s the author’s choice after all. However I just came yesterday upon a section where Sober reproduces the error central to Templeton’s thesis and discussed on the Og a few days ago. He indeed states that “the simpler model cannot have the higher prior probability—a point that Popper (1959) emphasized.” And he insists further that there is no reason for thinking that

P(\theta=0) > P(\theta>0)

is true (page 84). (The measure-theoretic objections raised earlier obviously apply there as well.) It must thus be more of a common misconception among philosophers of science than I previously thought….

As described on the backcover, the purpose of the book is

“How should the concept of evidence be understood? And how does the concept of evidence apply to the controversy about creationism as well as to work in evolutionary biology about natural selection and common ancestry? In this rich and wide-ranging book, Elliott Sober investigates general questions about probability and evidence and shows how the answers he develops to those questions apply to the specifics of evolutionary biology. Drawing on a set of fascinating examples, he analyzes whether claims about intelligent design are untestable; whether they are discredited by the fact that many adaptations are imperfect; how evidence bears on whether present species trace back to common ancestors; how hypotheses about natural selection can be tested, and many other issues. His book will interest all readers who want to understand philosophical questions about evidence and evolution, as they arise both in Darwin’s work and in contemporary biological research.”

Sober applies these concepts of evidence to some versions of creationism… I am obviously reluctant to go through this second chapter about creationism as there is no use in arguing about the existence of gods in a book about science, but I am still curious to see how Sober analyses this issue.