As the [catholic] daily La X has a special “Sciences&éthique” report on data science and scientists, my mom [a long time subscriber] mailed me [by post] the central pages where it appeared. The contents are not great, focusing as often on a few sentences from and missing on the fundamental limitations of self-learning algorithms. As an aside, the leaflet contained a short interview by Jean-Stéphane Dhersin, who is head of the CNRS ModCov19 centralising platform [and anecdotally a neighbour] on the notion that a predictive model in epidemiology can be both scientific and imprecise.
Archive for k nearest neighour
data science in La X
Posted in Books, Kids, pictures, Statistics with tags AI, CNRS, COVID-19, data science, ethics, France, interview, journalism, k nearest neighour, La Croix, ModCov19, modelling, pandemic on January 25, 2022 by xi'anfinding our way in the dark
Posted in Books, pictures, Statistics with tags AMIS, approximate Bayesian inference, approximate MCMC, Bayesian Analysis, Bayesian synthetic likelihood, conditional density, k nearest neighour, knn estimator, random forests, storm on November 18, 2021 by xi'anThe paper Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods by Evgeny Levi and (my friend) Radu Craiu, recently got published in Bayesian Analysis. The central motivation for their work is that both ABC and synthetic likelihood are costly methods when the data is large and does not allow for smaller summaries. That is, when summaries S of smaller dimension cannot be directly simulated. The idea is to try to estimate
since this is the substitute for the likelihood used for ABC. (A related idea is to build an approximate and conditional [on θ] distribution on the distance, idea with which Doc. Stoehr and I played a wee bit without getting anything definitely interesting!) This is a one-dimensional object, hence non-parametric estimates could be considered… For instance using k-nearest neighbour methods (which were already linked with ABC by Gérard Biau and co-authors.) A random forest could also be used (?). Or neural nets. The method still requires a full simulation of new datasets, so I wonder at the gain unless the replacement of the naïve indicator with h(θ) brings clear improvement to the approximation. Hence much fewer simulations. The ESS reduction is definitely improved, esp. since the CPU cost is higher. Could this be associated with the recourse to independent proposals?
In a sence, Bayesian synthetic likelihood does not convey the same appeal, since is a bit more of a tough cookie: approximating the mean and variance is multidimensional. (BSL is always more expensive!)
As a side remark, the authors use two chains in parallel to simplify convergence proofs, as we did a while ago with AMIS!