Archive for thesis defence

Bayesian thinking for toddler & Bayesian probabilities for babies [book reviews]

Posted in Statistics with tags , , , , , , , , , , on January 27, 2023 by xi'an

My friend E.-J.  Wagenmakers sent me a copy of Bayesian Thinking for Toddlers, “a must-have for any toddler with even a passing interest in Ockham’s razor and the prequential principle.” E.-J. wrote the story and Viktor Beekman (of thesis’ cover fame!) drew the illustrations. The book can be read for free on https://psyarxiv.com/w5vbp/, but not purchased as publishers were not interested and self-publishing was not available at a high enough quality level. Hence, in the end, 200 copies were made as JASP material, with me being the happy owner of one of these. The story follows two young girls competing for dinosaur expertise, and being rewarded by cookies, in proportion to the probability of providing the correct answer to two dinosaur questions. Toddlers may get less enthusiastic than grown-ups about the message, but they will love the drawings (and the questions if they are into dinosaurs).

This reminded me of the Bayesian probabilities for babies book, by Chris Ferrie, which details the computation of the probability that a cookie contains candy when the first bite holds none. It is more genuinely intended for young kids, in shape and design, as can be checked on a YouTube video, with an hypothetical population of cookies (with and without candy) being the proxy for the prior distribution. I hope no baby will be traumatised from being exposed too early to the notions of prior and posterior. Only data can tell, twenty years from now, if the book induced a spike or a collapse in the proportion of Bayesian statisticians!

[Disclaimer about potential self-plagiarism: this post or an edited version will potentially appear in my Books Review section in CHANCE.

congrats, Dr. Clarté!

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on October 9, 2021 by xi'an

Grégoire Clarté, whom I co-supervised with Robin Ryder, successfully defended his PhD thesis last Wednesday! On sign language classification, ABC-Gibbs and collective non-linear MCMC. Congrats to the now Dr.Clarté for this achievement and all the best for his coming Nordic adventure, as he is starting a postdoc at the University of Helsinki, with Aki Vehtari and others. It was quite fun to work with Grégoire along these years. And discussing on an unlimited number of unrelated topics, incl. fantasy books, teas, cooking and the role of conferences and travel in academic life! The defence itself proved a challenge as four members of the jury, incl. myself, were “present remotely” and frequently interrupted him for gaps in the Teams transmission, which nonetheless broadcasted perfectly the honks of the permanent traffic jam in Porte Dauphine… (And alas could not share a celebratory cup with him!)

parallel MCMC

Posted in Books, Statistics, Travel, University life with tags , , , on September 9, 2020 by xi'an

Yesterday, I remotely took part in the thesis defence of Balazs Nemeth, at Hasselt University, Belgium. As the pandemic conditions were alas still too uncertain to allow for travelling between France and Belgium… The thesis is about parallel strategies for speeding up MCMC, although the title is “Message passing computational methods with pharmacometrics applications”, as the thesis was suppported by Johnson & Johnson. (The defence was in English, as I do not understand a word of Dutch…)  Among the solutions, distributed affine-invariant sampling à la Goodman & Weare, speculative parallelism for SMC, and an automated parallelisation for hierarchical models that is the core input of the thesis. These methods were not associated with designing new MCMC algorithms but rather intended to take advantage of parallelisation for existing MCMC algorithms, which meant issues like asynchronicity or data splitting were not considered therein. I however found the work in the thesis innovative and promising and the PhD candidate was awarded the title by the jury at the end of the defence!

scalable Metropolis-Hastings, nested Monte Carlo, and normalising flows

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on June 16, 2020 by xi'an

Over a sunny if quarantined Sunday, I started reading the PhD dissertation of Rob Cornish, Oxford University, as I am the external member of his viva committee. Ending up in a highly pleasant afternoon discussing this thesis over a (remote) viva yesterday. (If bemoaning a lost opportunity to visit Oxford!) The introduction to the viva was most helpful and set the results within the different time and geographical zones of the Ph.D since Rob had to switch from one group of advisors in Engineering to another group in Statistics. Plus an encompassing prospective discussion, expressing pessimism at exact MCMC for complex models and looking forward further advances in probabilistic programming.

Made of three papers, the thesis includes this ICML 2019 [remember the era when there were conferences?!] paper on scalable Metropolis-Hastings, by Rob Cornish, Paul Vanetti, Alexandre Bouchard-Côté, Georges Deligiannidis, and Arnaud Doucet, which I commented last year. Which achieves a remarkable and paradoxical O(1/√n) cost per iteration, provided (global) lower bounds are found on the (local) Metropolis-Hastings acceptance probabilities since they allow for Poisson thinning à la Devroye (1986) and  second order Taylor expansions constructed for all components of the target, with the third order derivatives providing bounds. However, the variability of the acceptance probability gets higher, which induces a longer but still manageable if the concentration of the posterior is in tune with the Bernstein von Mises asymptotics. I had not paid enough attention in my first read at the strong theoretical justification for the method, relying on the convergence of MAP estimates in well- and (some) mis-specified settings. Now, I would have liked to see the paper dealing with a more complex problem that logistic regression.

The second paper in the thesis is an ICML 2018 proceeding by Tom Rainforth, Robert Cornish, Hongseok Yang, Andrew Warrington, and Frank Wood, which considers Monte Carlo problems involving several nested expectations in a non-linear manner, meaning that (a) several levels of Monte Carlo approximations are required, with associated asymptotics, and (b) the resulting overall estimator is biased. This includes common doubly intractable posteriors, obviously, as well as (Bayesian) design and control problems. [And it has nothing to do with nested sampling.] The resolution chosen by the authors is strictly plug-in, in that they replace each level in the nesting with a Monte Carlo substitute and do not attempt to reduce the bias. Which means a wide range of solutions (other than the plug-in one) could have been investigated, including bootstrap maybe. For instance, Bayesian design is presented as an application of the approach, but since it relies on the log-evidence, there exist several versions for estimating (unbiasedly) this log-evidence. Similarly, the Forsythe-von Neumann technique applies to arbitrary transforms of a primary integral. The central discussion dwells on the optimal choice of the volume of simulations at each level, optimal in terms of asymptotic MSE. Or rather asymptotic bound on the MSE. The interesting result being that the outer expectation requires the square of the number of simulations for the other expectations. Which all need converge to infinity. A trick in finding an estimator for a polynomial transform reminded me of the SAME algorithm in that it duplicated the simulations as many times as the highest power of the polynomial. (The ‘Og briefly reported on this paper… four years ago.)

The third and last part of the thesis is a proposal [to appear in ICML 20] on relaxing bijectivity constraints in normalising flows with continuously index flows. (Or CIF. As Rob made a joke about this cleaning brand, let me add (?) to that joke by mentioning that looking at CIF and bijections is less dangerous in a Trump cum COVID era at CIF and injections!) With Anthony Caterini, George Deligiannidis and Arnaud Doucet as co-authors. I am much less familiar with this area and hence a wee bit puzzled at the purpose of removing what I understand to be an appealing side of normalising flows, namely to produce a manageable representation of density functions as a combination of bijective and differentiable functions of a baseline random vector, like a standard Normal vector. The argument made in the paper is that imposing this representation of the density imposes a constraint on the topology of its support since said support is homeomorphic to the support of the baseline random vector. While the supporting theoretical argument is a mathematical theorem that shows the Lipschitz bound on the transform should be infinity in the case the supports are topologically different, these arguments may be overly theoretical when faced with the practical implications of the replacement strategy. I somewhat miss its overall strength given that the whole point seems to be in approximating a density function, based on a finite sample.

Bayesian inference with no likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , on January 28, 2020 by xi'an

This week I made a quick trip to Warwick for the defence (or viva) of the PhD thesis of Jack Jewson, containing novel perspectives on constructing Bayesian inference without likelihood or without complete trust in said likelihood. The thesis aimed at constructing minimum divergence posteriors in an M-open perspective and built a rather coherent framework from principles to implementation. There is a clear link with the earlier work of Bissiri et al. (2016), with further consistency constraints where the outcome must recover the true posterior in the M-closed scenario (if not always the case with the procedures proposed in the thesis).

Although I am partial to the use of empirical likelihoods in setting, I appreciated the position of the thesis and the discussion of the various divergences towards the posterior derivation (already discussed on this blog) , with interesting perspectives on the calibration of the pseudo-posterior à la Bissiri et al. (2016). Among other things, the thesis pointed out a departure from the likelihood principle and some of its most established consequences, like Bayesian additivity. In that regard, there were connections with generative adversarial networks (GANs) and their Bayesian versions that could have been explored. And an impression that the type of Bayesian robustness explored in the thesis has more to do with outliers than with misspecification. Epsilon-contamination amodels re quite specific as it happens, in terms of tails and other things.

The next chapter is somewhat “less” Bayesian in my view as it considers a generalised form of variational inference. I agree that the view of the posterior as a solution to an optimisation is tempting but changing the objective function makes the notion less precise.  Which makes reading it somewhat delicate as it seems to dilute the meaning of both prior and posterior to the point of becoming irrelevant.

The last chapter on change-point models is quite alluring in that it capitalises on the previous developments to analyse a fairly realistic if traditional problem, applied to traffic in London, prior and posterior to the congestion tax. However, there is always an issue with robustness and outliers in that the notion is somewhat vague or informal. Things start clarifying at the end but I find surprising that conjugates are robust optimal solutions since the usual folk theorem from the 80’s is that they are not robust.

%d bloggers like this: