Archive for random forests

day two at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , on June 30, 2022 by xi'an

Still woke up early too early, which let me go for a long run in Mont Royal (which felt almost immediately familiar from earlier runs at MCM 2017!) at dawn and at a pleasant temperature (but missed the top bagel bakery on the way back!). Skipped the morning plenary lectures to complete recommendation letters and finishing a paper submission. But had a terrific lunch with a good friend I had not seen in Covid-times, at a local branch of Kinton Ramen which I already enjoyed in Vancouver as my Airbnb was located on top of it.

I chaired the afternoon Bayesian computations session with Onur Teymur presenting the general spirit of his Neurips 21 paper on black box probabilistic numerics. Mentioning that a new textbook on the topic by Phillip Henning, Michael Osborne, and Hans Kersting had appeared today! The second talk was by Laura Bondi who discussed an ABC model choice approach to assess breast cancer screening. With enough missing data (out of 78051 women followed over 12 years) to lead to an intractable likelihood. Starting with vanilla ABC using 32 summaries and moving to our random forest approach. Unsurprisingly concluding with different top models, but not characterising the identifiability provided by the choice of the summaries. The third talk was by Ryan Chan (fresh Warwick PhD recipient), about a Fusion divide-and-conquer approach that avoids the approximation of earlier approaches. In particular he uses a clever accept-reject algorithm to generate a product of densities using the component densities. A nice trick that Murray explained to me while visiting in Paris lg ast month. (The approach appears to be parameterisation dependent.) The final talk was by Umberto Picchini and in a sort the synthetic likelihood mirror of Massi’s talk yesterday, in the sense of constructing a guided proposal relying on observed summaries. If not comparing both approaches on a given toy like the g-and-k distribution.

finding our way in the dark

Posted in Books, pictures, Statistics with tags , , , , , , , , , on November 18, 2021 by xi'an

The paper Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods by Evgeny Levi and (my friend) Radu Craiu, recently got published in Bayesian Analysis. The central motivation for their work is that both ABC and synthetic likelihood are costly methods when the data is large and does not allow for smaller summaries. That is, when summaries S of smaller dimension cannot be directly simulated. The idea is to try to estimate

h(\theta)=\mathbb{P}_\theta(d(S,S^\text{obs})\le\epsilon)

since this is the substitute for the likelihood used for ABC. (A related idea is to build an approximate and conditional [on θ] distribution on the distance, idea with which Doc. Stoehr and I played a wee bit without getting anything definitely interesting!) This is a one-dimensional object, hence non-parametric estimates could be considered… For instance using k-nearest neighbour methods (which were already linked with ABC by Gérard Biau and co-authors.) A random forest could also be used (?). Or neural nets. The method still requires a full simulation of new datasets, so I wonder at the gain unless the replacement of the naïve indicator with h(θ) brings clear improvement to the approximation. Hence much fewer simulations. The ESS reduction is definitely improved, esp. since the CPU cost is higher. Could this be associated with the recourse to independent proposals?

In a sence, Bayesian synthetic likelihood does not convey the same appeal, since is a bit more of a tough cookie: approximating the mean and variance is multidimensional. (BSL is always more expensive!)

As a side remark, the authors use two chains in parallel to simplify convergence proofs, as we did a while ago with AMIS!

the new DIYABC-RF

Posted in Books, pictures, R, Statistics, Wines with tags , , , , , , , , , , , , , , , , on April 15, 2021 by xi'an

My friends and co-authors from Montpellier have released last month the third version of the DIYABC software, DIYABC-RF, which includes and promotes the use of random forests for parameter inference and model selection, in connection with Louis Raynal’s thesis. Intended as the earlier versions of DIYABC for population genetic applications. Bienvenue!!!

The software DIYABC Random Forest (hereafter DIYABC-RF) v1.0 is composed of three parts: the dataset simulator, the Random Forest inference engine and the graphical user interface. The whole is packaged as a standalone and user-friendly graphical application named DIYABC-RF GUI and available at https://diyabc.github.io. The different developer and user manuals for each component of the software are available on the same website. DIYABC-RF is a multithreaded software on three operating systems: GNU/Linux, Microsoft Windows and MacOS. One can use the program can be used through a modern and user-friendly graphical interface designed as an R shiny application (Chang et al. 2019). For a fluid and simplified user experience, this interface is available through a standalone application, which does not require installing R or any dependencies and hence can be used independently. The application is also implemented in an R package providing a standard shiny web application (with the same graphical interface) that can be run locally as any shiny application, or hosted as a web service to provide a DIYABC-RF server for multiple users.

ABC & the eighth plague of Egypt [locusts in forests]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 6, 2021 by xi'an

“If you refuse to let them go, I will bring locusts into your country tomorrow. They will cover the face of the ground so that it cannot be seen. They will devour what little you have left after the hail, including every tree that is growing in your fields. They will fill your houses and those of all your officials and all the Egyptians.” Exodus 10:3-6

Marie-Pierre Chapuis, Louis Raynal, and co-authors, mostly from Montpellier, published last year a paper on the evolutionary history of the African arid-adapted pest locust, Schistocerca gregaria, called the eighth plague of Egypt in the Bible. And a cause for a major food disaster in East Africa over the past months. The analysis was run with ABC-RF techniques. The paper was first reviewed in PCI Evolutionary Biology, with the following points:

The present-day distribution of extant species is the result of the interplay between their past population demography (e.g., expansion, contraction, isolation, and migration) and adaptation to the environment (…) The understanding of the key factors driving species evolution gives important insights into how the species may respond to changing conditions, which can be particularly relevant for the management of harmful species, such as agricultural pests.

Meaningful demographic inferences present major challenges. These include formulating evolutionary scenarios fitting species biology and the eco-geographical context and choosing informative molecular markers and accurate quantitative approaches to statistically compare multiple demographic scenarios and estimate the parameters of interest. A further issue comes with result interpretation. Accurately dating the inferred events is far from straightforward since reliable calibration points are necessary to translate the molecular estimates of the evolutionary time into absolute time units (i.e. years). This can be attempted in different ways (…) Nonetheless, most experimental systems rarely meet these conditions, hindering the comprehensive interpretation of results.

The contribution of Chapuis et al. addresses these issues to investigate the recent history of the (…) desert locust (…) Owing to their fast mutation rate microsatellite markers offer at least two advantages: i) suitability for analyzing recently diverged populations, and ii) direct estimate of the germline mutation rate in pedigree samples (…) The main aim of the study is to infer the history of divergence of the two subspecies of the desert locust, which have spatially disjoint distribution corresponding to the dry regions of North and West-South Africa. They first use paleo-vegetation maps to formulate hypotheses about changes in species range since the last glacial maximum. Based on them, they generate 12 divergence models. For the selection of the demographic model and parameter estimation, they apply the recently developed ABC-RF approach (…) Some methodological novelties are also introduced in this work, such as the computation of the error associated with the posterior parameter estimates under the best scenario (…) The best-supported model suggests a recent divergence event of the subspecies of S. gregaria (around 2.6 kya) and a reduction of populations size in one of the subspecies (S. g. flaviventris) that colonized the southern distribution area. As such, results did not support the hypothesis that the southward colonization was driven by the expansion of African dry environments associated with the last glacial maximum (…) The estimated time of divergence points at a much more recent origin for the two subspecies, during the late Holocene, in a period corresponding to fairly stable arid conditions similar to current ones. Although the authors cannot exclude that their microsatellite data bear limited information on older colonization events than the last one, they bring arguments in favour of alternative explanations. The hypothesis privileged does not involve climatic drivers, but the particularly efficient dispersal behaviour of the species, whose individuals are able to fly over long distances (up to thousands of kilometers) under favourable windy conditions (…)

There is a growing number of studies in phylogeography in arid regions in the Southern hemisphere, but the impact of past climate changes on the species distribution in this region remains understudied relative to the Northern hemisphere. The study presented by Chapuis et al. offers several important insights into demographic changes and the evolutionary history of an agriculturally important pest species in Africa, which could also mirror the history of other organisms in the continent (…)

Microsatellite markers have been offering a useful tool in population genetics and phylogeography for decades (…) This study reaffirms the usefulness of these classic molecular markers to estimate past demographic events, especially when species- and locus-specific microsatellite mutation features are available and a powerful inferential approach is adopted. Nonetheless, there are still hurdles to overcome, such as the limitations in scenario choice associated with the simulation software used (e.g. not allowing for continuous gene flow in this particular case), which calls for further improvement of simulation tools allowing for more flexible modeling of demographic events and mutation patterns. In sum, this work not only contributes to our understanding of the makeup of the African biodiversity but also offers a useful statistical framework, which can be applied to a wide array of species and molecular markers.

 

 

quoting a book review of mine?!

Posted in Books, Statistics, University life with tags , , , , , , on January 10, 2020 by xi'an
%d bloggers like this: