There will now be a second mirror workshop of ABC in Grenoble. Taking place at the Université de Montpellier, more precisely at the Alexander Grothendieck Montpellier Institute, Building 9, room 430 (4th floor), Triolet Campus. It is organised by my friend Jean-Michel Marin. Great to see a mirror at one of the major breeding places of ABC, where I personally heard of ABC for the first time and met several of the main A[B]Ctors..! The dates are 19-20 March, with talks transmitted from 9am to 5am [GMT+1]. Since the video connection can accommodate 1918 more mirrors, if anyone else is interested in organising another mirror, please contact me for technical details.

## Archive for population genetics

## another mirror of ABC in Gre[e]noble

Posted in Statistics with tags ABC, ABC in Grenoble, Alexandre Grothendieck, Approximate Bayesian computation, Grenoble, INRA, likelihood-free methods, mirror workshop, population genetics, Université de Montpellier, videoconference on March 3, 2020 by xi'an## down with Galton (and Pearson and Fisher…)

Posted in Books, Statistics, University life with tags Annals of Eugenics, Biometrika, eugenics, Francis Galton, Genetics, history of statistics, honours, Karl Pearson, London, physiognomy, population genetics, R.A. Fisher, racism, Stephen Stigler, UCL, University College London on July 22, 2019 by xi'an

**I**n the last issue of Significance, which I read in Warwick prior to the conference, there is a most interesting article on Galton’s eugenics, his heritage at University College London (UCL), and the overall trouble with honouring prominent figures of the past with memorials like named building or lectures… The starting point of this debate is a protest from some UCL students and faculty about UCL having a lecture room named after the late Francis Galton who was a professor there. Who further donated at his death most of his fortune to the university towards creating a professorship in eugenics. The protests are about Galton’s involvement in the eugenics movement of the late 18th and early 19th century. As well as professing racist opinions.

My first reaction after reading about these protests was *why not?!* Named places or lectures, as well as statues and other memorials, have a limited utility, especially when the named person is long dead and they certainly do not contribute in making a scientific theory [associated with the said individual] more appealing or more valid. And since “humans are [only] humans”, to quote Stephen Stigler speaking in this article, it is unrealistic to expect great scientists to be perfect, the more if one multiplies the codes for ethical or acceptable behaviours across ages and cultures. It is also more rational to use amphitheater MS.02 and lecture room AC.18 rather than associate them with one name chosen out of many alumni’s or former professors’.

Predictably, another reaction of mine was *why bother?!,* as removing Galton’s name from the items it is attached to is highly unlikely to change current views on eugenism or racism. On the opposite, it seems to detract from opposing the present versions of these ideologies. As some recent proposals linking genes and some form of academic success. Another of my (multiple) reactions was that as stated in the article these views of Galton’s reflected upon the views and prejudices of the time, when the notions of races and inequalities between races (as well as genders and social classes) were almost universally accepted, including in scientific publications like the proceedings of the Royal Society and Nature. When Karl Pearson launched the Annals of Eugenics in 1925 (after he started Biometrika) with the very purpose of establishing a scientific basis for eugenics. (An editorship that Ronald Fisher would later take over, along with his views on the differences between races, believing that “human groups differ profoundly in their innate capacity for intellectual and emotional development”.) Starting from these prejudiced views, Galton set up a scientific and statistical approach to support them, by accumulating data and possibly modifying some of these views. But without much empathy for the consequences, as shown in this terrible quote I found when looking for more material:

“I should feel but little compassion if I saw all the Damaras in the hand of a slave-owner, for they could hardly become more wretched than they are now…”

As it happens, my first exposure to Galton was in my first probability course at ENSAE when a terrific professor was peppering his lectures with historical anecdotes and used to mention Galton’s data-gathering trip to Namibia, literally measure local inhabitants towards his physiognomical views , also reflected in the above attempt of his to superpose photographs to achieve the “ideal” thief…

## ABC by QMC

Posted in Books, Kids, Statistics, University life with tags ABC, ABC-PMC, ABC-SMC, CREST, JCGS, PhD thesis, population genetics, population Monte Carlo, qMC, quasi-Monte Carlo methods, variance reduction on November 5, 2018 by xi'an**A** paper by Alexander Buchholz (CREST) and Nicolas Chopin (CREST) on quasi-Monte Carlo methods for ABC is going to appear in the *Journal of Computational and Graphical Statistics*. I had missed the opportunity when it was posted on arXiv and only became aware of the paper’s contents when I reviewed Alexander’s thesis for the doctoral school. The fact that the parameters are simulated (in ABC) from a prior that is quite generally a standard distribution while the pseudo-observations are simulated from a complex distribution (associated with the intractability of the likelihood function) means that the use of quasi-Monte Carlo sequences is in general only possible for the first part.

The ABC context studied there is close to the original version of ABC rejection scheme [as opposed to SMC and importance versions], the main difference standing with the use of M pseudo-observations instead of one (of the same size as the initial data). This repeated version has been discussed and abandoned in a strict Monte Carlo framework in favor of M=1 as it increases the overall variance, but the paper uses this version to show that the multiplication of pseudo-observations in a quasi-Monte Carlo framework does not increase the variance of the estimator. (Since the variance apparently remains constant when taking into account the generation time of the pseudo-data, we can however dispute the interest of this multiplication, except to produce a constant variance estimator, for some targets, or to be used for convergence assessment.) L The article also covers the bias correction solution of Lee and Latuszyǹski (2014).

Due to the simultaneous presence of pseudo-random and quasi-random sequences in the approximations, the authors use the notion of mixed sequences, for which they extend a one-dimension central limit theorem. The paper focus on the estimation of Z(ε), the normalization constant of the ABC density, ie the predictive probability of accepting a simulation which can be estimated at a speed of O(N⁻¹) where N is the number of QMC simulations, is a wee bit puzzling as I cannot figure the relevance of this constant (function of ε), especially since the result does not seem to generalize directly to other ABC estimators.

A second half of the paper considers a sequential version of ABC, as in ABC-SMC and ABC-PMC, where the proposal distribution is there based on a Normal mixture with a *small* number of components, estimated from the (particle) sample of the previous iteration. Even though efficient techniques for estimating this mixture are available, this innovative step requires a calculation time that should be taken into account in the comparisons. The construction of a decreasing sequence of tolerances ε seems also pushed beyond and below what a sequential approach like that of Del Moral, Doucet and Jasra (2012) would produce, it seems with the justification to always prefer the lower tolerances. This is not necessarily the case, as recent articles by Li and Fearnhead (2018a, 2018b) and ours have shown (Frazier et al., 2018). Overall, since ABC methods are large consumers of simulation, it is interesting to see how the contribution of QMC sequences results in the reduction of variance and to hope to see appropriate packages added for standard distributions. However, since the most consuming part of the algorithm is due to the simulation of the pseudo-data, in most cases, it would seem that the most relevant focus should be on QMC add-ons on this part, which may be feasible for models with a huge number of standard auxiliary variables as for instance in population evolution.

## MCM 2017

Posted in Statistics with tags ABC, ABC algorithm, ABC consistency, Bayesian model choice, curse of dimensionality, Hilbert curve, MCM 2017, Montréal, population genetics, Québec, random forests, summary statistics, Wasserstein distance on July 3, 2017 by xi'an## Darwin’s radio [book review]

Posted in Books, Kids, pictures, University life with tags biological theories, Blood Music, book review, Charles Darwin, DNA, genome, Greg Bear, human ev, Human Genetics, Melbourne, Native Americans, Nature, Neanderthal, population genetics, Richard Dawkins, Sacramento, science fiction on September 10, 2016 by xi'an**W**hen in Sacramento two weeks ago I came across the Beers Books Center bookstore, with a large collection of used and (nearly) new cheap books and among other books I bought Greg Bear’s Darwin Radio. I had (rather) enjoyed another book of his’, Hull Zero Three, not to mention one of his first books, Blood Music, I read in the mid 1980’s, and the premises of this novel sounded promising, not mentioning the Nebula award. The theme is of a major biological threat, apparently due to a new virus, and of the scientific unraveling of what the threat really means. (*Spoilers alert!*) In that respect it sounds rather similar to the (great) Crichton‘s The Andromeda Strain, which is actually mentioned by some characters in this book. As is Ebola, as a sort of contrapoint (since Ebola is a deadly virus, although the epidemic in Western Africa now seems to have vanished). The biological concept exploited here is dormant DNA in non-coding parts of the genome that periodically get awaken and induce massive steps in the evolution. So massive that carriers of those mutations are killed by locals. Until the day it happens in an all-connected World and the mutation can no longer be stopped. The concept is compelling if not completely convincing of course, while the outcome of a new human race, which is to Homo Sapiens what Homo Sapiens was to Neanderthal, is rather disappointing. (How could it be otherwise?!) But I did appreciate the postulate of a massive and immediate change in the genome, even though the details were disputable and the dismissal of Dawkins‘ perspective poorly defended. From a stylistic perspective, the style is at time heavy, while there are too many chance occurrences, like the main character happening to be in Georgia for a business deal (spoilers, spoilers!) at the times of the opening of collective graves, or the second main character coming upon a couple of Neanderthal mummies with a Sapiens baby, or yet this pair of main characters falling in love and delivering a live mutant baby-girl. But I enjoyed reading it between San Francisco and Melbourne, with a few hours of lost sleep and work. It is a page turner, no doubt! I also like the political undercurrents, from riots to emergency measures, to an effective dictatorship controlling pregnancies and detaining newborns and their mothers.

One important thread in the book deals with anthropology digs getting against Native claims to corpses and general opposition to such digs. This reminded me of a very recent article in Nature where a local Indian tribe had claimed rights to several thousand year old skeletons, whose DNA was then showed to be more related with far away groups than the claimants. But where the tribe was still granted the last word, in a rather worrying jurisprudence.

## the new version of abcrf

Posted in pictures, R, Statistics, University life with tags ABC, CRAN, DIYABC, population genetics, R, random forests, regression random forest on June 7, 2016 by xi'an**A** new version of the R package abcrf has been posted on Friday by Jean-Michel Marin, in conjunction with the recent arXival of our paper on point estimation via ABC and random forests. The new R functions come to supplement the existing ones towards implementing ABC point estimation:

*covRegAbcrf*, which predicts the posterior covariance between those two response variables, given a new dataset of summaries.*plot.regAbcrf*, which provides a variable importance plot;*predict.regabcrf*, which predicts the posterior expectation, median, variance, quantiles for a given parameter and a new dataset;*regAbcrf*, which produces a regression random forest from a reference table aimed out predicting posterior expectation, variance and quantiles for a parameter;*snp*, a simulated example in population genetics used as reference table in our Bioinformatics paper.

Unfortunately, we could not produce directly a *diyabc2abcrf* function for translating a regular DIYABC output into a proper abcrf format, since the translation has to occur in DIYABC instead. And even this is not a straightforward move (to be corrected in the next version of DIYABC).

## postdoc on Bayesian computation for statistical genomics

Posted in Kids, Statistics, Travel, University life with tags DNA, genomics, population genetics, postdoctoral position, United Kingdom, University of Reading on February 24, 2016 by xi'an*[An opportunity to work with Richard Everitt in Reading, UK, in a postdoc position starting this summer]*

It is now possible to retrieve the complete DNA sequence of a bacterial strain relatively quickly and cheaply, and population genetics has been revolutionised in the past ten years through the availability of these data. To gain a deep understanding of sequence data, model-based statistical techniques are required. However, current approaches for performing inference in these models do not scale to whole genome sequence data. The BBSRC project “Understanding recombination through tractable statistical analysis of whole genome sequences” aims to address this issue. A position as Post-Doctoral Research Assistant is available on this project, supervised by Dr Richard Everitt in the Statistics group at the Department of Mathematics & Statistics at the University of Reading.

The deadline for applications is March 31, 2016 (details).