## waste tide

Posted in Books, pictures, Travel with tags , , , , , , , , , , on April 11, 2021 by xi'an

I presumably bought this book upon a suggestion made by the Amazon AI. It sounded quite original and interesting. And translated by Ken Liu. I had not seen the above cover, but it would have only helped. (And reminded me of the daunting and bittersweet Tales from the Loop.)

“None of this, of course, existed in the digital world. In their place were highly abstract algorithms and programs that turned the complicated messy world into a set of mathematical models and topological spaces. Like a real spiderweb, the web would be deformed by any insect that got caught into it, and the rate at which such deformation evolved exceeded the rate at which information might be transmitted under the restricted-bitrate regulations. In this world, the shortest path between two points was no longer the straight line.”

Waste Tide is immensely puzzling and definitely interesting. A Chinese form of Neuromancer…. With further links to the Windup Girl. The location of the novel is a near-future island in Guiyu, China. Where the World electric waste ends up, to be processed and recycled by “waste people”. Who are despised by the original inhabitants of the island. And exploited by clans and American companies. Several of the main characters find themselves torn between several cultures, but these characters often sound a bit too caricaturesque. Just like the take-over of a “waste girl” by a residual AI is somewhat clumsy. Far from the constructs of Neuromancer or Windup Girl.

Another interesting side of the book is the translation by Ken Liu, who also translated The Three Body Problem. As well as published short stories of his own. The preface warns about the multiple languages co-existing in China, beyond the most well-known Cantonese and Mandarin and the book includes footnotes about the proper pronunciation of some words.

## a journal of the [second] plague year

Posted in Books, Kids, pictures, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , on April 10, 2021 by xi'an

Read the picaresque El Buscòn (in French, translated by Nicolas Restif de La Bretonne), dating from 1602-1604, but the classic French translation from a century later is quite enjoyable and the story often hilarious. (I read this book after reading in 2019 the BD (comics) by Alain Ayroles and Juanjo Guarnido called les Indes Fourbes, that was inspired from El Buscòn and pretended to produce its sequel, located in South America). Also read the second volume of Olen Steinhauer, The Confession, just as impressive a dig into the minutiae of a Balkanic socialist dictature as the first one. And into the complex mind of another militia inspector in the homicide squad. (Just wondering if there were truly paper cups in the post-war Eastern block!)

Made my first fresh pastas with the traditional pasta machine my daughter got me as a Xmas present! I need improvements but, despite the mess this creates (flour everywhere!), it is a real treat to eat fresh pastas. The next goal is to check if soba noodles can be made with the machine….

Watched some parts of a rather terrible Korean series, Demon Catchers (or The Uncanny Counter). With absolutely no redeeming feature, although a very popular show… And the beginning episodes of another SF Korean series, Alice,  playing with time travel themes until it hit the usual paradoxes. (At least the physics fomulae on the white boards sounded correct, even though the grossly romanticised home office of a physics professor made no sense.)

Gave up on Augusto Cruz’ London after Midnight. Which revolves around the search for a surviving copy of the 1927 horror movie London after midnight, made by Tod Browning, and seemingly cursed. The plot is terrible and the style awful, an unpalatable endless infodump… Read P. Djeli Clark’s delightful short story A Dead Djinn in Cairo, which is a prequel to Haunting of tramcar 105 about a supernatural Cairo in the early 1900’s.

## training energy based models

Posted in Books, Statistics with tags , , , , , , , on April 7, 2021 by xi'an

This recent arXival by Song and Kingma covers different computational approaches to semi-parametric estimation, but also exposes imho the chasm existing between statistical and machine learning perspectives on the problem.

“Energy-based models are much less restrictive in functional form: instead of specifying a normalized probability, they only specify the unnormalized negative log-probability (…) Since the energy function does not need to integrate to one, it can be parameterized with any nonlinear regression function.”

The above in the introduction appears first as a strange argument, since the mass one constraint is the least of the problems when addressing non-parametric density estimation. Problems like the convergence, the speed of convergence, the computational cost and the overall integrability of the estimator. It seems however that the restriction or lack thereof is to be understood as the ability to use much more elaborate forms of densities, which are then black-boxes whose components have little relevance… When using such mega-over-parameterised representations of densities, such as neural networks and normalising flows, a statistical assessment leads to highly challenging questions. But convergence (in the sample size) does not appear to be a concern for the paper. (Except for a citation of Hyvärinen on p.5.)

Using MLE in this context appears to be questionable, though, since the base parameter θ is not unlikely to remain identifiable. Computing the MLE is therefore a minor issue, in this regard, a resolution based on simulated gradients being well-chartered from the earlier era of stochastic optimisation as in Robbins & Monro (1954), Duflo (1996) or Benveniste & al. (1990). (The log-gradient of the normalising constant being estimated by the opposite of the gradient of the energy at a random point.)

“Running MCMC till convergence to obtain a sample x∼p(x) can be computationally expensive.”

Contrastive divergence à la Hinton (2002) is presented as a solution to the convergence problem by stopping early, which seems reasonable given the random gradient is mostly noise. With a possible correction for bias à la Jacob & al. (missing the published version).

An alternative to MLE is the 2005 Hyvärinen score, notorious for bypassing the normalising constant. But blamed in the paper for being costly in the dimension d of the variate x, due to the second derivative matrix. Which can be avoided by using Stein’s unbiased estimator of the risk (yay!) if using randomized data. And surprisingly linked with contrastive divergence as well, if a Taylor expansion is good enough an approximation! An interesting byproduct of the discussion on score matching is to turn it into an unintended form of ABC!

“Many methods have been proposed to automatically tune the noise distribution, such as Adversarial Contrastive Estimation (Bose et al., 2018), Conditional NCE (Ceylan and Gutmann, 2018) and Flow Contrastive Estimation (Gao et al., 2020).”

A third approach is the noise contrastive estimation method of Gutmann & Hyvärinen (2010) that connects with both others. And is a precursor of GAN methods, mentioned at the end of the paper via a (sort of) variational inequality.

## ABC & the eighth plague of Egypt [locusts in forests]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 6, 2021 by xi'an

“If you refuse to let them go, I will bring locusts into your country tomorrow. They will cover the face of the ground so that it cannot be seen. They will devour what little you have left after the hail, including every tree that is growing in your fields. They will fill your houses and those of all your officials and all the Egyptians.” Exodus 10:3-6

Marie-Pierre Chapuis, Louis Raynal, and co-authors, mostly from Montpellier, published last year a paper on the evolutionary history of the African arid-adapted pest locust, Schistocerca gregaria, called the eighth plague of Egypt in the Bible. And a cause for a major food disaster in East Africa over the past months. The analysis was run with ABC-RF techniques. The paper was first reviewed in PCI Evolutionary Biology, with the following points:

The present-day distribution of extant species is the result of the interplay between their past population demography (e.g., expansion, contraction, isolation, and migration) and adaptation to the environment (…) The understanding of the key factors driving species evolution gives important insights into how the species may respond to changing conditions, which can be particularly relevant for the management of harmful species, such as agricultural pests.

Meaningful demographic inferences present major challenges. These include formulating evolutionary scenarios fitting species biology and the eco-geographical context and choosing informative molecular markers and accurate quantitative approaches to statistically compare multiple demographic scenarios and estimate the parameters of interest. A further issue comes with result interpretation. Accurately dating the inferred events is far from straightforward since reliable calibration points are necessary to translate the molecular estimates of the evolutionary time into absolute time units (i.e. years). This can be attempted in different ways (…) Nonetheless, most experimental systems rarely meet these conditions, hindering the comprehensive interpretation of results.

The contribution of Chapuis et al. addresses these issues to investigate the recent history of the (…) desert locust (…) Owing to their fast mutation rate microsatellite markers offer at least two advantages: i) suitability for analyzing recently diverged populations, and ii) direct estimate of the germline mutation rate in pedigree samples (…) The main aim of the study is to infer the history of divergence of the two subspecies of the desert locust, which have spatially disjoint distribution corresponding to the dry regions of North and West-South Africa. They first use paleo-vegetation maps to formulate hypotheses about changes in species range since the last glacial maximum. Based on them, they generate 12 divergence models. For the selection of the demographic model and parameter estimation, they apply the recently developed ABC-RF approach (…) Some methodological novelties are also introduced in this work, such as the computation of the error associated with the posterior parameter estimates under the best scenario (…) The best-supported model suggests a recent divergence event of the subspecies of S. gregaria (around 2.6 kya) and a reduction of populations size in one of the subspecies (S. g. flaviventris) that colonized the southern distribution area. As such, results did not support the hypothesis that the southward colonization was driven by the expansion of African dry environments associated with the last glacial maximum (…) The estimated time of divergence points at a much more recent origin for the two subspecies, during the late Holocene, in a period corresponding to fairly stable arid conditions similar to current ones. Although the authors cannot exclude that their microsatellite data bear limited information on older colonization events than the last one, they bring arguments in favour of alternative explanations. The hypothesis privileged does not involve climatic drivers, but the particularly efficient dispersal behaviour of the species, whose individuals are able to fly over long distances (up to thousands of kilometers) under favourable windy conditions (…)

There is a growing number of studies in phylogeography in arid regions in the Southern hemisphere, but the impact of past climate changes on the species distribution in this region remains understudied relative to the Northern hemisphere. The study presented by Chapuis et al. offers several important insights into demographic changes and the evolutionary history of an agriculturally important pest species in Africa, which could also mirror the history of other organisms in the continent (…)

Microsatellite markers have been offering a useful tool in population genetics and phylogeography for decades (…) This study reaffirms the usefulness of these classic molecular markers to estimate past demographic events, especially when species- and locus-specific microsatellite mutation features are available and a powerful inferential approach is adopted. Nonetheless, there are still hurdles to overcome, such as the limitations in scenario choice associated with the simulation software used (e.g. not allowing for continuous gene flow in this particular case), which calls for further improvement of simulation tools allowing for more flexible modeling of demographic events and mutation patterns. In sum, this work not only contributes to our understanding of the makeup of the African biodiversity but also offers a useful statistical framework, which can be applied to a wide array of species and molecular markers.

## hands-on probability 101

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on April 3, 2021 by xi'an

When solving a rather simple probability question on X validated, namely the joint uniformity of the pair

$(X,Y)=(A-B+\mathbb I_{A

when A,B,C are iid U(0,1), I chose a rather pedestrian way and derived the joint distribution of (A-B,C-B), which turns to be made of 8 components over the (-1,1)² domain. And to conclude at the uniformity of the above, I added a hand-made picture to explain why the coverage by (X,Y) of any (red) square within (0,1)² was uniform by virtue of the symmetry between the coverage by (A-B,C-B) of four copies of the (red) square, using color tabs that were sitting on my desk..! It did not seem to convince the originator of the question, who kept answering with more questions—or worse an ever-changing question, reproduced in real time on math.stackexchange!, revealing there that said originator was tutoring an undergrad student!—but this was a light moment in a dreary final day before a new lockdown.