## relativity of falsification?

Posted in Books, Kids with tags , , , , , on October 13, 2013 by xi'an

“It seems entirely reasonable to believe in the effectiveness of T.C.M. and still have grave doubts about qi… The causal theory that’s concocted to explain the practical successes of treatment is not terribly important or interesting to the poor schlub who’s thrown out his back or taken ill.”

After writing that piece in the train, I alas missed my flight to Warwick (by 3 minutes, not due to writing the post!) and then checked the paper on the Web where I found this much more detailed criticism by Jerry Coyne (professor at U of Chicago and author of a book called Why evolution is true?)

## genetics

Posted in Books, Kids, Travel, University life with tags , , , , , , , , , , on April 9, 2012 by xi'an

Today, I was reading in the science leaflet of Le Monde about a new magnitude in sequencing cancerous tumors (wrong link, I know…). This made me wonder whether the sequence of (hundreds of) mutations leading from a normal cell to a cancerous one could be reconstituted in the way a genealogy is. (This reminds me of another exciting genetic article I read in the Eurostar back from London on Thursday, in the Economist, about the colonization of Madagascar by 30 women from the Malay archipelago: “The island was one of the last places on Earth to be settled, receiving its earliest migrants in the middle of the first millennium AD…“)

As a double coincidence, I was reading La Recherche yesterday in the métro to Dauphine, which central theme this month is about heredity beyond genetics. (Double because this also connected with the meeting in London.) The keyword is epigenetics, namely the activation or inactivation of a gene and the hereditary transmission of this character w/o a genetic mutation. This is quite interesting as it implies the hereditability of some adopted traits, i.e. forces one to reconsider the nature versus nurture debate. (This sentence is another input due to Galton!) It also implies that a much faster rate of species differentiation due to environmental changes (than the purely genetic one) is possible, which may sound promising in the light of the fast climate changes we are currently facing. However, what I do not understand is why the journal included a paper on the consequences of epigenetics on the Darwinian theory of evolution and… intelligent design. Indeed, I do not see why the inclusion of different vectors in the hereditary process would contradict Darwin’s notion of natural selection. Or even why considering a scientific modification or replacement of the current Darwinian theory of evolution would be an issue. Charles Darwin wrote his book in 1859, prior to the start of genetics, and the immense advances made since then led to modifications and adjustments from his original views. Without involving any irrational belief in the process.

## Evidence and evolution (5)

Posted in Books, Statistics with tags , , , , , , , , , , , , on April 29, 2010 by xi'an

“Tout étant fait pour une fi n, tout est nécessairement pour la meilleure fi n. Remarquez bien que les nez ont été faits pour porter des lunettes, aussi avons-nous des lunettes.” Voltaire, Candide, Chapitre 1.

I am now done with my review of Sober’s Evidence and Evolution: The Logic Behind the Science, Posting about each chapter along the way helped me a lot to write down the review over the past few days. Its conclusion is that

Evidence and Evolution is very well-written, with hardly any typo (the unbiasedness property of AIC is stated at the bottom of page 101 with the expectation symbol E on the wrong side of the equation, Figure 3.8c is used instead of Figure 3.7c on page 204, Figure 4.7 is used instead of Figure 4.8 on page 293, Simon Tavaré’s name is always spelled Taveré, vaules rather than values is repeated four times on page 339). The style is sometimes too light and often too verbose, with an abundance of analogies that I regard as sidetracking, but this makes for an easier reading (except for the sentence “the key to answering the second question is that the observation that X = 1 and Y = 1 produces stronger evidence favoring CA over SA the lower the probability is that the ancestors postulated by the two hypotheses were in state 1”, on page 314, that still eludes me!). As detailed in this review, I have points of contentions with the philosophical views about testing in Evidence and Evolution as well as about the methods exposed therein, but this does not detract from the appeal of reading the book. (The lack of completely worked out statistical hypotheses in realistic settings remains the major issue in my criticism of the book.) While the criticisms of the Bayesian paradigm are often shallow (like the one on page 97 ridiculing Bayesians drawing inference based on a single observation), there is nothing fundamentally wrong with the statistical foundations of the book. I therefore repeat my earlier recommendation in favour of Evidence and Evolution, Chapters 1 and (paradoxically) 5 being the easier entries. Obviously, readers familiar with Sober’s earlier papers and books will most likely find a huge overlap with those but others will gather Sober’s viewpoints on the notion of testing hypotheses in a (mostly) unified perspective.

And, as illustrated by the above quote, I found the sentence from Voltaire’s Candide I wanted to include. Of course, this 12 page review may be overly long for the journal it was intended for, Human Genetics, in which case I will have to find another outlet for the current arXived version. But I enjoyed reading this book with a pencil and gathered enough remarks along the way to fill those twelve pages.

## Evidence and evolution (4)

Posted in Books, Statistics with tags , , , , on April 26, 2010 by xi'an

“Darwinians would not be satisfied if all life on Earth derived from the same large slab of rock.” (E&E, p.269)

Thanks to Eyjafjallajökull, I used the three and a half hours in the train back from Marseille to conclude my lecture of Sober’s Evidence and Evolution: The Logic Behind the Science, The final chapter (apart from the concluding summary) is about “Common ancestry” and may be the most statistically oriented of the three chapters about evolution. This is not to say the chapter is without defaults, including in particular a certain tendency to repeat the same arguments. but this is somehow the chapter I appreciated the most. The chapter starts with a detailed analysis on how the hypothesis of common ancestry should be set, the main distinction being between one organism and several, while pointing out the confusing effect of lateral gene transfer.  Inference about phylogenetic trees and the use of genetic sequences rather than simplistic traits gets us closer to the true issues at stake. Another interesting feature of this chapter is the relation to Darwin’s reflections on the common origin of life on Earth through many quotes.

“If those prior probabilities are obscure, the same will be true of the posterior probabilities.” (E&E, p.277)

The statistical issue is thus of testing for a common ancestor versus separate ancestors for a set of organisms. The nature of the information contained in the data is never made precise enough to understand whether this fits the principle of total evidence stressed throughout the book. The chapter also shows a more lenient disposition towards Bayesian solutions but Section 4.3 ends up with an impossibility statement, due to the impossibility of defining an objective prior because Sober wants prior probabilities that have some authority. This is a self-defeating constraint leading to empirically well-grounded priors.

“Those propositions suffice for similarity to be evidence for common ancestry, and they have broad applicability.” (E&E, p.283)

The part about Reichenbach’s (1956) sufficient condition for a common trait to induce a likelihood ratio larger than one in favour of the continuous ancestor hypothesis needs to be discussed as this is the point I find the most puzzling in the chapter. Indeed, most of the nine assumptions of Reichenbach (1956) relates both models under comparison, i.e. common ancestry versus separate ancestry. This seems to me to be a weird thing to do as models under comparison should not share all of their parameters! For instance, if we build a Bayesian model to compare those models, we would use a prior distribution on each group of parameters. Having a common parameter does not make sense since we end up selecting one of the two models. I wonder if this is the result of a reluctance to have true parameters as in a regular statistical analysis.  (See, e.g., the lament that “until values for adjustable parameters are specified, we cannot talk about the probability of the data under different hypotheses”, p.338.) What is striking is the reliance of the whole chapter on this unnatural set of hypotheses since it keeps resurfacing throughout the chapter. Sober writes that Propositions 1-9 are not consequences of the axioms of probability. Neither are they necessary conditions for common ancestry to have a higher likelihood than separate ancestry (p.283). Nonetheless, this is creating a unnecessary bias in the perception of the problem which may induce critics of evolution to reject the whole approach.

“If there was no such common ancestor, what would alignment ever mean?” (E&E, p.291)

The theme of the missing model I have alluded to in the previous posts is also recurrent in this chapter. There are a lot of paragraphs about the choice of the representation of the difference between two species, from trait to gene sequence, and the author acknowledges that the difficulty in this choice has to do with a requirement for a more advanced theoretical representation (model) adapted to more complex data. This sounds rather obvious stated that way but the book wanders around this point for pages! (An example is the above quote that misses the point about sequence alignment: this is a perfectly well-defined measure of distance, common ancestor or not.) And the overall conclusion is a vague call for the principle of total evidence (which is a rephrasing of the likelihood principle). As illustrated in the section on multiple characters, the discussion is confusing without a proper model. It is only on page 300 of the book that a completely defined model for the evolution of a dichotomous trait (i.e. the simplest possible case) appears. This model is a rather crude tool, as it depends on arbitrary calibration factors like $P(Z=0)=0.99$ instead of 1 and, more importantly, on an unspecified time (as in “what time is it on the evolution clock?“). The corresponding likelihood ratio is then (under one of the selection schemes)

$\dfrac{0.01b_t^2 + 0.99}{[0.01b_t+0.99]^2}$

where the dependence on those factors is obvious. This illustrates the impossibility to reach a satisfactory conclusion without going first through a statistical analysis of the problem.

“It is possible for data to discriminate among a set of hypotheses without saying anything about a proposition that is common to all the alternatives considered” (E&E, p.315)

The debate about the phylogenetic tree reconstruction versus the test for common ancestry (Sections 4.7 and 4.8) lacks appeals for the very reason exposed above. The tree structure may be incorporated within the model(s) and integrated out in a Bayesian fashion to provide the marginal likelihood of the model(s). Although this seems to be an important issue, as illustrated by the controversy with Templeton, the opposition between likelihood inference and “cladistic” parsimony is not properly conducted in that, as a naïve reader, I cannot understand Sober’s presentation of the later. This section is much more open to Bayesian processing by abstaining from the usual criticism about the lack of objectivity of the prior selection, but it entirely misses the ability of the Bayesian approach to integrate out the nuisance parameters, whether they are the tree topology (standard marginalisation) or the model index (model averaging). The debate about the limited meaning of statistical consistency is making the valid point that consistency only puts light on the case when the hypothesised model is true, but extended consistency could have been considered as well, namely that the procedure will bring the hypothesised model as close as possible to the “true” model within the hypothesised family of models. What I gather from this final section is that cladistic parsimony tries to do without models (if not without assumptions), which seems to relate to Templeton’s views about Bayesian inference.

Again, this is certainly the most enjoyable chapter of the book from my point of view (besides the nice recap about methods of inference in Chapter1), even though the lack of real illustrations makes it less potent than it could be. It also shows the limitation of a philosophical debate on simplistic idealisations of the real model. The book only acknowledges on page 334 that genealogical hypotheses are composite. Better late than never, but I think that an incorporation of the parameter estimation in the inferential process would not have hurt the quality of the debate.