deaths at sea and a workshop

For several years, actually from the beginning of the Syrian revolution, I have been looking for data and for statisticians working on migrant deaths resulting from crossing the Mediterranean. With very little success, either because the researchers I met had poor and fragmented data, or because the agencies I contacted showed no (good) will into returning these statistics. Frontex being the most blatant example. I thus read with a lot of interest this article “Uncounted: Invisible Deaths on Europe’s Borders” which analyses the reasons for not producing statistics on the deaths at sea linked with desperate migrants crossing the sea in ill-suited boats.

In connection with this pressing issue, Kerrie Mengersen, Pierre Pudlo and myself organise next November a small workshop on Young Bayesians and Big Data for social good, at CIRM, Marseille, France. It will take place on the weekend before our main conference, Bayesian statistics in the Big Data era, that is, on 23-26 November 2018. Registration is free (and on site accomodation is cheap) but the number of attendees is limited, so apply asap! Senior participants include at this stage Tamara Broderick (MIT), Julien Cornebise (Element AI, TBC), David Corliss (Peace Work), Ruth King (Edinburgh), Cody Ross (UCSD, TBC), and the workshop aims at bringing participants to work together on methodological challenges and characteristic datasets. The outcome of the workshop will be presented at the beginning of the Bayesian statistics in the Big Data era, conference, on Monday 26 November.

Bayesian statistics in the big data era

In conjunction with Kerrie Mengersen’ Jean Morlet Chair at CIRM, Luminy, Marseilles, we organise a special conference “Bayesian statistics in the big data era” on November 26-30, 2018, with the following speakers having already confirmed attendance

Louis Aslett (Durham, UK)
Sudipto Banerjee (UCLA, US)
Tamara Broderick (MIT, US)
Noël Cressie (Wollongong, OZ)
Marco Cuturi (ENSAE, FR)
David Dunson (Duke, US)
Sylvia Frühwirth-Schnatter (WU, AU)
Amy Herring (Duke, US)
Gregor Kastner (WU, AU)
Ruth King (Edinburgh, UK)
Gary Koop (Edinburgh, UK)
Antonio Lijoi (Bocconi, IT)
Jean-Michel Marin (Montpellier, FR)
Antonietta Mira (Lugano, CH)
Peter Müller (UT Austin, US)
Igor Pruenster (Bocconi, IT)
Stéphane Robin (INRA, FR)
Heejung Shim (U Melbourne, OZ)
Minh-Ngoc Tran (UNSW, OZ)
Darren Wilkinson (Newcastle, UK)


Registration is free but compulsory, and we encourage all interested data scientists (and beyond) to apply and to contribute a talk or a poster. The size of the audience is limited to a maximum of 80 participants, on a first-come first-serve basis. (Cheap housing is available on the campus, located in the gorgeous national park des Calanques south of Marseilles.)

In connection with this conference, there will be a workshop the previous weekend on “Young Bayesians and Big Data for social good”, to get junior researchers interested in the analysis of data related with social issues and human rights to work with a few senior researchers. More details soon, here and on the CIRM website.

Masterclass in Bayesian Statistics in Marseilles next Fall

This post is to announce a second occurrence of the exciting “masterclass in Bayesian Statistics” that we organised in 2016, near Marseilles. It will take place on 22-26 October 2018 once more at CIRM (Centre International de Recherches Mathématiques, Luminy, Marseilles, France). The targeted audience includes all scientists interested in learning how Bayesian inference may be used to tackle the practical problems they face in their own research. In particular PhD students and post-docs should benefit most directly from this masterclass. Among the invited speakers, Kerrie Mengersen from QUT, Brisbane, visiting Marseilles this Fall, will deliver a series of lectures on the interface between Bayesian statistics and applied modelling, Havard Rue from KAUST will talk on computing with INLA, and Aki Vehtari from Aalto U, Helsinki, will give a course on Bayesian model assessment and model choice. There will be two tutorials on R and on Stan.

All interested participants in this masterclass should pre-register as early as possible, given that the total attendance is limited to roughly 90 participants. Some specific funding for local expenses (i.e., food + accommodation on-siteat CIRM) is available (thanks to CIRM, and potentially to Fondation Jacques Hadamard, to be confirmed); this funding will be attributed by the scientific committee, with high priority to PhD students and post-docs.

back from CIRM

near Col de Sugiton, Parc National des Calanques, Marseille, March 01, 2016As should be clear from earlier posts, I tremendously enjoyed this past week at CIRM, Marseille, and not only for providing a handy retreat from where I could go running and climbing at least twice a day!  The programme (with slides and films soon to be available on the CIRM website) was very well-designed with mini-courses and talks of appropriate length and frequency. Thanks to Nicolas Chopin (ENSAE ParisTech) and Gilles Celeux  (Inria Paris) for constructing so efficiently this program and to the local organisers Thibaut Le Gouic (Ecole Centrale de Marseille), Denys Pommeret (Aix-Marseille Université), and Thomas Willer (Aix-Marseille Université) for handling the practical side of inviting and accommodating close to a hundred participants on this rather secluded campus. I hope we can reproduce the experiment a few years from now. Maybe in 2018 if we manage to squeeze it between BayesComp 2018 [ex-MCMski] and ISBA 2018 in Edinburgh.

One of the bonuses of staying at CIRM is indeed that it is fairly isolated and far from the fury of down-town Marseille, which may sound like a drag, but actually helps with concentration and interactions. Actually, the whole Aix-Marseille University campus of Luminy on which CIRM is located is surprisingly quiet: we were there in the very middle of the teaching semester and saw very few students around (although even fewer boars!). It is a bit of a mystery that a campus built in such a beautiful location with the Mont Puget as its background and the song of cicadas as the only source of “noise” is not better exploited towards attracting more researchers and students. However remoteness and lack of efficient public transportation may explain a lot about this low occupation of the campus. As may the poor quality of most buildings on the campus, which must be unbearable during the summer months…

In a potential planning for the future Bayesian week at CIRM, I think we could have some sort of poster sessions after-dinner (with maybe a cash bar operated by some of the invited students since there is no bar at CIRM or around). Or trail-running under moonlight, trying to avoid tripping over rummaging boars… A sort of Kaggle challenge would be nice but presumably too hard to organise. As a simpler joint activity, we could collectively contribute to some wikipedia pages related to Bayesian and computational statistics.

Sugiton at dawn

nigh boar at CIRM

a foraging boar a few metres away from CIRM, not even bothered by my presence, Marseille, March 04, 2016

at CIRM [#3]

Simon Barthelmé gave his mini-course on EP, with loads of details on the implementation of the method. Focussing on the EP-ABC and MCMC-EP versions today. Leaving open the difficulty of assessing to which limit EP is converging. But mentioning the potential for asynchronous EP (on which I would like to hear more). Ironically using several times a logistic regression example, if not on the Pima Indians benchmark! He also talked about approximate EP solutions that relate to consensus MCMC. With a connection to Mark Beaumont’s talk at NIPS [at the time as mine!] on the comparison with ABC. While we saw several talks on EP during this week, I am still agnostic about the potential of the approach. It certainly produces a fast proxy to the true posterior and hence can be exploited ad nauseam in inference methods based on pseudo-models like indirect inference. In conjunction with other quick and dirty approximations when available. As in ABC, it would be most useful to know how far from the (ideal) posterior distribution does the approximation stands. Machine learning approaches presumably allow for an evaluation of the predictive performances, but less so for the modelling accuracy, even with new sampling steps. [But I know nothing, I know!]

Dennis Prangle presented some on-going research on high dimension [data] ABC. Raising the question of what is the true meaning of dimension in ABC algorithms. Or of sample size. Because the inference relies on the event d(s(y),s(y’))≤ξ or on the likelihood l(θ|x). Both one-dimensional. Mentioning Iain Murray’s talk at NIPS [that I also missed]. Re-expressing as well the perspective that ABC can be seen as a missing or estimated normalising constant problem as in Bornn et al. (2015) I discussed earlier. The central idea is to use SMC to simulate a particle cloud evolving as the target tolerance ξ decreases. Which supposes a latent variable structure lurking in the background.

Judith Rousseau gave her talk on non-parametric mixtures and the possibility to learn parametrically about the component weights. Starting with a rather “magic” result by Allman et al. (2009) that three repeated observations per individual, all terms in a mixture are identifiable. Maybe related to that simpler fact that mixtures of Bernoullis are not identifiable while mixtures of Binomial are identifiable, even when n=2. As “shown” in this plot made for X validated. Actually truly related because Allman et al. (2009) prove identifiability through a finite dimensional model. (I am surprised I missed this most interesting paper!) With the side condition that a mixture of p components made of r Bernoulli products is identifiable when p ≥ 2[log² r] +1, when log² is base 2-logarithm. And [x] the upper rounding. I also find most relevant this distinction between the weights and the remainder of the mixture as weights behave quite differently, hardly parameters in a sense.