Archive for population genetics

MCM 2017

Posted in Statistics with tags , , , , , , , , , , , , on July 3, 2017 by xi'an

And thus I am back in Montréal, for MCM 2017, located in HEC Montréal, on the campus of Université de Montréal, for three days. My talk is predictably about ABC, what else?!, gathering diverse threads from different talks and papers:

Darwin’s radio [book review]

Posted in Books, Kids, pictures, University life with tags , , , , , , , , , , , , , , , , on September 10, 2016 by xi'an

When in Sacramento two weeks ago I came across the Beers Books Center bookstore, with a large collection of used and (nearly) new cheap books and among other books I bought Greg Bear’s Darwin Radio. I had (rather) enjoyed another book of his’, Hull Zero Three, not to mention one of his first books, Blood Music, I read in the mid 1980’s, and the premises of this novel sounded promising, not mentioning the Nebula award. The theme is of a major biological threat, apparently due to a new virus, and of the scientific unraveling of what the threat really means. (Spoilers alert!) In that respect it sounds rather similar to the (great) Crichton‘s The Andromeda Strain, which is actually mentioned by some characters in this book. As is Ebola, as a sort of contrapoint (since Ebola is a deadly virus, although the epidemic in Western Africa now seems to have vanished). The biological concept exploited here is dormant DNA in non-coding parts of the genome that periodically get awaken and induce massive steps in the evolution. So massive that carriers of those mutations are killed by locals. Until the day it happens in an all-connected World and the mutation can no longer be stopped. The concept is compelling if not completely convincing of course, while the outcome of a new human race, which is to Homo Sapiens what Homo Sapiens was to Neanderthal, is rather disappointing. (How could it be otherwise?!) But I did appreciate the postulate of a massive and immediate change in the genome, even though the details were disputable and the dismissal of Dawkins‘ perspective poorly defended. From a stylistic perspective, the style is at time heavy, while there are too many chance occurrences, like the main character happening to be in Georgia for a business deal (spoilers, spoilers!) at the times of the opening of collective graves, or the second main character coming upon a couple of Neanderthal mummies with a Sapiens baby, or yet this pair of main characters falling in love and delivering a live mutant baby-girl. But I enjoyed reading it between San Francisco and Melbourne, with a few hours of lost sleep and work. It is a page turner, no doubt! I also like the political undercurrents, from riots to emergency measures, to an effective dictatorship controlling pregnancies and detaining newborns and their mothers.

One important thread in the book deals with anthropology digs getting against Native claims to corpses and general opposition to such digs. This reminded me of a very recent article in Nature where a local Indian tribe had claimed rights to several thousand year old skeletons, whose DNA was then showed to be more related with far away groups than the claimants. But where the tribe was still granted the last word, in a rather worrying jurisprudence.

the new version of abcrf

Posted in pictures, R, Statistics, University life with tags , , , , , , on June 7, 2016 by xi'an

gaarden tree, Jan. 16, 2012A new version of the R package abcrf has been posted on Friday by Jean-Michel Marin, in conjunction with the recent arXival of our paper on point estimation via ABC and random forests. The new R functions come to supplement the existing ones towards implementing ABC point estimation:

  1. covRegAbcrf, which predicts the posterior covariance between those two response variables, given a new dataset of summaries.
  2. plot.regAbcrf, which provides a variable importance plot;
  3. predict.regabcrf, which predicts the posterior expectation, median, variance, quantiles for a given parameter and a new dataset;
  4. regAbcrf, which produces a regression random forest from a reference table aimed out predicting posterior expectation, variance and quantiles for a parameter;
  5. snp, a simulated example in population genetics used as reference table in our Bioinformatics paper.

Unfortunately, we could not produce directly a diyabc2abcrf function for translating a regular DIYABC output into a proper abcrf format, since the translation has to occur in DIYABC instead. And even this is not a straightforward move (to be corrected in the next version of DIYABC).

postdoc on Bayesian computation for statistical genomics

Posted in Kids, Statistics, Travel, University life with tags , , , , , on February 24, 2016 by xi'an

[An opportunity to work with Richard Everitt in Reading, UK, in a postdoc position starting this summer]

It is now possible to retrieve the complete DNA sequence of a bacterial strain relatively quickly and cheaply, and population genetics has been revolutionised in the past ten years through the availability of these data. To gain a deep understanding of sequence data, model-based statistical techniques are required. However, current approaches for performing inference in these models do not scale to whole genome sequence data. The BBSRC project “Understanding recombination through tractable statistical analysis of whole genome sequences” aims to address this issue. A position as Post-Doctoral Research Assistant is available on this project, supervised by Dr Richard Everitt in the Statistics group at the Department of Mathematics & Statistics at the University of Reading.

The deadline for applications is March 31, 2016 (details).

JSM 2015 [day #3]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on August 12, 2015 by xi'an

My first morning session was about inference for philogenies. While I was expecting some developments around Kingman’s  coalescent models my coauthors needed and developped ABC for, I was surprised to see models that were producing closed form (or close enough to) likelihoods. Due to strong restrictions on the population sizes and migration possibilities, as explained later to me by Vladimir Minin. No need for ABC there since MCMC was working on the species trees, with Vladimir Minin making use of [the Savage Award winner] Vinayak Rao’s approach on trees that differ from the coalescent. And enough structure to even consider and demonstrate tree identifiability in Laura Kubatko’s case.

I then stopped by the astrostatistics session as the first talk by Gwendolin Eddie was about galaxy mass estimation, a problem I may actually be working on in the Fall, but it ended up being a completely different problem and I was further surprised that the issue of whether or not the data was missing at random was not considered by the authors.searise3

Christening a session Unifying foundation(s) may be calling for trouble, at least from me! In this spirit, Xiao Li Meng gave a talk attempting at a sort of unification of the frequentist, Bayesian, and fiducial paradigms by introducing the notion of personalized inference, which is a notion I had vaguely thought of in the past. How much or how far do you condition upon? However, I have never thought of this justifying fiducial inference in any way and Xiao Li’s lively arguments during and after the session not enough to convince me of the opposite: Prior-free does not translate into (arbitrary) choice-free. In the earlier talk about confidence distributions by Regina Liu and Minge Xie, that I partly missed for Galactic reasons, I just entered into the room at the very time when ABC was briefly described as a confidence distribution because it was not producing a convergent approximation to the exact posterior, a logic that escapes me (unless those confidence distributions are described in such a loose way as to include about any method f inference). Dongchu Sun also gave us a crash course on reference priors, with a notion of random posteriors I had not heard of before… As well as constructive posteriors… (They seemed to mean constructible matching priors as far as I understood.)

The final talk in this session by Chuanhei Liu on a new approach (modestly!) called inferential model was incomprehensible, with the speaker repeatedly stating that the principles were too hard to explain in five minutes and needed an incoming book… I later took a brief look at an associated paper, which relates to fiducial inference and to Dempster’s belief functions. For me, it has the same Münchhausen feeling of creating a probability out of nothing, creating a distribution on the parameter by ignoring the fact that the fiducial equation x=a(θ,u) modifies the distribution of u once x is observed.