Archive for ABC

ABC & the eighth plague of Egypt [locusts in forests]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 6, 2021 by xi'an

“If you refuse to let them go, I will bring locusts into your country tomorrow. They will cover the face of the ground so that it cannot be seen. They will devour what little you have left after the hail, including every tree that is growing in your fields. They will fill your houses and those of all your officials and all the Egyptians.” Exodus 10:3-6

Marie-Pierre Chapuis, Louis Raynal, and co-authors, mostly from Montpellier, published last year a paper on the evolutionary history of the African arid-adapted pest locust, Schistocerca gregaria, called the eighth plague of Egypt in the Bible. And a cause for a major food disaster in East Africa over the past months. The analysis was run with ABC-RF techniques. The paper was first reviewed in PCI Evolutionary Biology, with the following points:

The present-day distribution of extant species is the result of the interplay between their past population demography (e.g., expansion, contraction, isolation, and migration) and adaptation to the environment (…) The understanding of the key factors driving species evolution gives important insights into how the species may respond to changing conditions, which can be particularly relevant for the management of harmful species, such as agricultural pests.

Meaningful demographic inferences present major challenges. These include formulating evolutionary scenarios fitting species biology and the eco-geographical context and choosing informative molecular markers and accurate quantitative approaches to statistically compare multiple demographic scenarios and estimate the parameters of interest. A further issue comes with result interpretation. Accurately dating the inferred events is far from straightforward since reliable calibration points are necessary to translate the molecular estimates of the evolutionary time into absolute time units (i.e. years). This can be attempted in different ways (…) Nonetheless, most experimental systems rarely meet these conditions, hindering the comprehensive interpretation of results.

The contribution of Chapuis et al. addresses these issues to investigate the recent history of the (…) desert locust (…) Owing to their fast mutation rate microsatellite markers offer at least two advantages: i) suitability for analyzing recently diverged populations, and ii) direct estimate of the germline mutation rate in pedigree samples (…) The main aim of the study is to infer the history of divergence of the two subspecies of the desert locust, which have spatially disjoint distribution corresponding to the dry regions of North and West-South Africa. They first use paleo-vegetation maps to formulate hypotheses about changes in species range since the last glacial maximum. Based on them, they generate 12 divergence models. For the selection of the demographic model and parameter estimation, they apply the recently developed ABC-RF approach (…) Some methodological novelties are also introduced in this work, such as the computation of the error associated with the posterior parameter estimates under the best scenario (…) The best-supported model suggests a recent divergence event of the subspecies of S. gregaria (around 2.6 kya) and a reduction of populations size in one of the subspecies (S. g. flaviventris) that colonized the southern distribution area. As such, results did not support the hypothesis that the southward colonization was driven by the expansion of African dry environments associated with the last glacial maximum (…) The estimated time of divergence points at a much more recent origin for the two subspecies, during the late Holocene, in a period corresponding to fairly stable arid conditions similar to current ones. Although the authors cannot exclude that their microsatellite data bear limited information on older colonization events than the last one, they bring arguments in favour of alternative explanations. The hypothesis privileged does not involve climatic drivers, but the particularly efficient dispersal behaviour of the species, whose individuals are able to fly over long distances (up to thousands of kilometers) under favourable windy conditions (…)

There is a growing number of studies in phylogeography in arid regions in the Southern hemisphere, but the impact of past climate changes on the species distribution in this region remains understudied relative to the Northern hemisphere. The study presented by Chapuis et al. offers several important insights into demographic changes and the evolutionary history of an agriculturally important pest species in Africa, which could also mirror the history of other organisms in the continent (…)

Microsatellite markers have been offering a useful tool in population genetics and phylogeography for decades (…) This study reaffirms the usefulness of these classic molecular markers to estimate past demographic events, especially when species- and locus-specific microsatellite mutation features are available and a powerful inferential approach is adopted. Nonetheless, there are still hurdles to overcome, such as the limitations in scenario choice associated with the simulation software used (e.g. not allowing for continuous gene flow in this particular case), which calls for further improvement of simulation tools allowing for more flexible modeling of demographic events and mutation patterns. In sum, this work not only contributes to our understanding of the makeup of the African biodiversity but also offers a useful statistical framework, which can be applied to a wide array of species and molecular markers.



likelihood-free and summary-free?

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , , , , on March 30, 2021 by xi'an

My friends and coauthors Chris Drovandi and David Frazier have recently arXived a paper entitled A comparison of likelihood-free methods with and without summary statistics. In which they indeed compare these two perspectives on approximate Bayesian methods like ABC and Bayesian synthetic likelihoods.

“A criticism of summary statistic based approaches is that their choice is often ad hoc and there will generally be an  inherent loss of information.”

In ABC methods, the recourse to a summary statistic is often advocated as a “necessary evil” against the greater evil of the curse of dimension, paradoxically providing a faster convergence of the ABC approximation (Fearnhead & Liu, 2018). The authors propose a somewhat generic selection of summary statistics based on [my undergrad mentors!] Gouriéroux’s and Monfort’s indirect inference, using a mixture of Gaussians as their auxiliary model. Summary-free solutions, as in our Wasserstein papers, rely on distances between distributions, hence are functional distances, that can be seen as dimension-free as well (or criticised as infinite dimensional). Chris and David consider energy distances (which sound very much like standard distances, except for averaging over all permutations), maximum mean discrepancy as in Gretton et al. (2012), Cramèr-von Mises distances, and Kullback-Leibler divergences estimated via one-nearest-neighbour formulas, for a univariate sample. I am not aware of any degree of theoretical exploration of these functional approaches towards the precise speed of convergence of the ABC approximation…

“We found that at least one of the full data approaches was competitive with or outperforms ABC with summary statistics across all examples.”

The main part of the paper, besides a survey of the existing solutions, is to compare the performances of these over a few chosen (univariate) examples, with the exact posterior as the golden standard. In the g & k model, the Pima Indian benchmark of ABC studies!, Cramèr does somewhat better. While it does much worse in an M/G/1 example (where Wasserstein does better, and similarly for a stereological extremes example of Bortot et al., 2007). An ordering inversed again for a toad movement model I had not seen before. While the usual provision applies, namely that this is a simulation study on unidimensional data and a small number of parameters, the design of the four comparison experiments is very careful, eliminating versions that are either too costly or too divergence, although this could be potentially criticised for being unrealistic (i.e., when the true posterior is unknown). The computing time is roughly the same across methods, which essentially remove the call to kernel based approximations of the likelihood. Another point of interest is that the distance methods are significantly impacted by transforms on the data, which should not be so for intrinsic distances! Demonstrating the distances are not intrinsic…


Posted in Statistics with tags , , , , , , , on March 24, 2021 by xi'an

An AISTATS 2021 paper by Masahiro Fujisawa,Takeshi Teshima, Issei Sato and Masashi Sugiyama (RIKEN, Tokyo) just appeared on arXiv.  (AISTATS 2021 is again virtual this year.)

“ABC can be sensitive to outliers if a data discrepancy measure is chosen inappropriately (…) In this paper, we propose a novel outlier-robust and computationally-efficient discrepancy measure based on the γ-divergence”

The focus is on measure of robustness for ABC distances as those can be lethal if insufficient summarisation is used. (Note that a referenced paper by Erlis Ruli, Nicola Sartori and Laura Ventura from Padova appeared last year on robust ABC.) The current approach mixes the γ-divergence of Fujisawa and Eguchi, with a k-nearest neighbour density estimator. Which may not prove too costly, of order O(n log n), but also may be a poor if robust approximation, even if it provides an asymptotic unbiasedness and almost surely convergent approximation. These properties are those established in the paper, which only demonstrates convergence in the sample size n to an ABC approximation with the true γ-divergence but with a fixed tolerance ε, when the most recent results are rather concerned with the rates of convergence of ε(n) to zero. (An extensive simulation section compares this approach with several ABC alternatives, incl. ours using the Wasserstein distance. If I read the comparison graphs properly, it does not look as if there is a huge discrepancy between the two approaches under no contamination.) Incidentally, the paper contains a substantial survey section and has a massive reference list, if missing the publication more than a year earlier of our Wasserstein paper in Series B.

Metropolis-Hastings via classification

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on February 23, 2021 by xi'an

Veronicka Rockova (from Chicago Booth) gave a talk on this theme at the Oxford Stats seminar this afternoon. Starting with a survey of ABC, synthetic likelihoods, and pseudo-marginals, to motivate her approach via GANs, learning an approximation of the likelihood from the GAN discriminator. Her explanation for the GAN type estimate was crystal clear and made me wonder at the connection with Geyer’s 1994 logistic estimator of the likelihood (a form of discriminator with a fixed generator). She also expressed the ABC approximation hence created as the actual posterior times an exponential tilt. Which she proved is of order 1/n. And that a random variant of the algorithm (where the shift is averaged) is unbiased. Most interestingly requiring no calibration and no tolerance. Except indirectly when building the discriminator. And no summary statistic. Noteworthy tension between correct shape and correct location.

why should I pay $350 for an on-line conference?!

Posted in Statistics, Travel, University life with tags , , , , , , , on February 11, 2021 by xi'an

Last year, I was invited to the SIAM-CSE20 conference for a session on ABC, in München, and was about ready to leave when the conference was posponed for pandemic reasons. My hotel in Garching charged me the entire stay, Danke sehr!,  and I am uncertain about having been fully reimbursed for the reservation. The conference is taking place this year as an on-line meeting and I got re-invited. However, when trying to register, I found that the fees were the same as last year. And did not see the point, given the possibility of delivering a high quality virtual conference for free!, except to support SIAM. Hence withdrew my participation to the meeting, which was not particularly high on my list anyway… (On the same day I received another invitation for an Insurance conference, but they got me confused with my namesake!)