data assimilation and reduced modelling for high-D problems [CIRM]

Next summer, from 19 July till 27 August, there will be a six week program at CIRM on the above theme, bringing together scientists from both the academic and industrial communities. The program includes a one-week summer school followed by 5 weeks of research sessions on projects proposed by academic and industrial partners.

Confirmed speakers of the summer school (Jul 19-23) are:

  • Albert Cohen (Sorbonne University)
  • Masoumeh Dashti (University of Sussex)
  • Eric Moulines (Ecole Polytechnique)
  • Anthony Nouy (Ecole Centrale de Nantes)
  • Claudia Schillings (Mannheim University)

Junior participants may apply for fellowships to cover part or the whole stay. Registration and application to fellowships will be open soon.

end-to-end Bayesian learning [CIRM]

Next Fall, there will be a workshop at CIRM, Luminy, Marseilles, on Bayesian learning. It takes place 22-29 October 2021 on this wonderful campus at the border with the beautiful Parc National des Calanques, in a wonderfully renovated CIRM building and involves friends and colleagues of mine as organisers and plenary speakers. (I am not involved!, but plan to organise a scalable MCMC workshop there the year after!) The conference is well-supported and the housing fees will be minimal since the centre is also subsidized by CNRS. The deadline for contributed talks and posters is 22 March, while it is 15 June for registration. Hopefully by this time the horizon will have cleared up enough to consider traveling and meeting again. Hopefully. (In which case I will miss this wonderful conference due to other meeting and teaching commitments in the Fall.)

Bayesian phylogeographic inference of SARS-CoV-2

Nature Communications of 10 October has a paper by Philippe Lemey et al. (incl. Marc Suchard) on including travel history and removing sampling bias on the study of the virus spread. (Which I was asked to review for a CNRS COVID watch platform, Bibliovid.)

The data is made of curated genomes available in GISAID on March 10, that is, before lockdown even started in France. With (trustworthy?) travel history data for over 20% of the sampled patients. (And an unwelcome reminder that Hong Kong is part of China, at a time of repression and “mainlandisation” by the CCP.)

“we model a discrete diffusion process between 44 locations within China, including 13 provinces, one municipality (Beijing), and one special administrative area (Hong Kong). We fit a generalized linear model (GLM) parameterization of the discrete diffusion process…”

The diffusion is actually a continuous-time Markov process, with a phylogeny that incorporates nodes associated with location. The Bayesian analysis of the model is made by MCMC, since, contrary to ABC, the likelihood can be computed by Felsenstein’s pruning algorithm. The covariates are used to calibrate the Markov process transitions between locations. The paper also includes a posterior predictive accuracy assessment.

“…we generate Markov jump estimates of the transition histories that are averaged over the entire posterior in our Bayesian inference.”

In particular the paper describes “travel-aware reconstruction” analyses that track the spatial path followed by a virus until collection, as below. The top graph represents the posterior probability distribution of this path.Given the lack of representativity, the authors also develop an additional “approach that adds unsampled taxa to assess the sensitivity of inferences to sampling bias”, although it mostly reflects the assumptions made in producing the artificial data. (With a possible connection with ABC?). If I understood correctly, they added 458 taxa for 14 locations,

An interesting opening made in the conclusion about the scalability of the approach:

“With the large number of SARS-CoV-2 genomes now available, the question arises how scalable the incorporation of un-sampled taxa will be. For computationally expensive Bayesian inferences, the approach may need to go hand in hand with down-sampling procedures or more detailed examination of specific sub-lineages.”

In the end, I find it hard, as with other COVID-related papers I read, to check how much the limitations, errors, truncations, &tc., attached with the data at hand impact the validation of this philogeographic reconstruction, and how the model can help further than reconstructing histories of contamination at the (relatively) early stage.

Irène Waldspurger, CNRS bronze medal

My colleague at Paris Dauphine, Irène Waldspurger, got one of the prestigious CNRS bronze medals this year. Irène is working on inverse problems and machine learning, with applications to sensing and imaging. Congrats!