## data science in La X

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , , , on January 25, 2022 by xi'an

As the [catholic] daily La X has a special “Sciences&éthique” report on data science and scientists, my mom [a long time subscriber] mailed me [by post] the central pages where it appeared. The contents are not great, focusing as often on a few sentences from  and missing on the fundamental limitations of self-learning algorithms. As an aside, the leaflet contained a short interview by Jean-Stéphane Dhersin, who is head of the CNRS ModCov19 centralising platform [and anecdotally a neighbour] on the notion that a predictive model in epidemiology can be both scientific and imprecise.

## finding our way in the dark

Posted in Books, pictures, Statistics with tags , , , , , , , , , on November 18, 2021 by xi'an

The paper Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods by Evgeny Levi and (my friend) Radu Craiu, recently got published in Bayesian Analysis. The central motivation for their work is that both ABC and synthetic likelihood are costly methods when the data is large and does not allow for smaller summaries. That is, when summaries S of smaller dimension cannot be directly simulated. The idea is to try to estimate

$h(\theta)=\mathbb{P}_\theta(d(S,S^\text{obs})\le\epsilon)$

since this is the substitute for the likelihood used for ABC. (A related idea is to build an approximate and conditional [on θ] distribution on the distance, idea with which Doc. Stoehr and I played a wee bit without getting anything definitely interesting!) This is a one-dimensional object, hence non-parametric estimates could be considered… For instance using k-nearest neighbour methods (which were already linked with ABC by Gérard Biau and co-authors.) A random forest could also be used (?). Or neural nets. The method still requires a full simulation of new datasets, so I wonder at the gain unless the replacement of the naïve indicator with h(θ) brings clear improvement to the approximation. Hence much fewer simulations. The ESS reduction is definitely improved, esp. since the CPU cost is higher. Could this be associated with the recourse to independent proposals?

In a sence, Bayesian synthetic likelihood does not convey the same appeal, since is a bit more of a tough cookie: approximating the mean and variance is multidimensional. (BSL is always more expensive!)

As a side remark, the authors use two chains in parallel to simplify convergence proofs, as we did a while ago with AMIS!