Archive for George Casella

step-dads with Bayesian design [One World ABC’minar, 21 March]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on March 18, 2024 by xi'an

The next One World ABC seminar is taking place (on-line, requiring pre-registration) on Thursday 21 March, 9:00am UK time, with Desi Ivanova (University of Oxford), speaking about Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design:

We develop a semi-amortized, policy-based, approach to Bayesian experimental design (BED) called Step-wise Deep Adaptive Design (Step-DAD). Like existing, fully amortized, policy-based BED approaches, Step-DAD trains a design policy upfront before the experiment. However, rather than keeping this policy fixed, Step-DAD periodically updates it as data is gathered, refining it to the particular experimental instance. This allows it to improve both the adaptability and the robustness of the design strategy compared with existing approaches.

(Which reminded me of George’s book on design in 2008.)

insufficient Gibbs sampling

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on July 29, 2023 by xi'an

We have just arXived our paper on insufficient Gibbs sampling with Antoine Luciano and Robin Ryder, from Université Paris Dauphine. This is Antoine’s first paper and part of his PhD. (In particular, he wrote the entire code.) The idea stemmed from a discussion on ABC benchmarks, like the one when the pair (median, MAD) is the only available observation. With no available joint density, the setting seems to prohibit calling for an MCMC sampler. However, simulating the complete data set conditional on these statistics proves feasible, with a bit of bookkeeping. With obviously much better results [demonstrated above for a Cauchy example] than when calling ABC and at a very similar cost. (If not accounting for the ability of ABC to be parallelised.)  The idea can be extended to other settings, obviously, as long as completion remains achievable. (And a big thanks to our friend Ed George who suggested the title, while at CIRM. I had suggested “Gibbs for boars” as a poster title, in connection with the historical time-line of

Gibbs for Kids (Casella and George) — Gibbs for Pigs (Gianola) — Gibbs for Robust Pigs = Gibbs for Boars

and the abundance of boars on the Luminy campus, but this did not sound convincing enough for Antoine.)

probabilistic numerics [book review]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , on July 28, 2023 by xi'an

Probabilistic numerics: Computation as machine learning is a 2022 book by Philipp Henning, Michael Osborne, and Hans Kersting that was sent to me by CUP (upon my request and almost free of charge, as I had to pay custom charges, thanks to Brexit!). With the important message of bringing statistical tools to numerics. I remember Persi Diaconis calling for (such) actions in the 1980’s (and even reading a paper of his on the topic along with George Casella in Ithaca while waiting for his car to get serviced!).

From a purely aesthetic view point, the book reads well, offers a beautiful cover and sells for a quite reasonable price for an academic book. Plus it is associated with a website containing draft version of the book. Code, links to courses, research, conferences are also available there. Just a side remark that it enjoys very wide margins that may have encouraged an inflation of footnotes (but also exercises). Except when formulas get in the way (as e.g. on p.40).

The figure below is an excerpt from the introduction that sets the scene of probabilistic numerics involving algorithms as agents, gathering data and making decisions, with an obvious analogy with standard Bayesian decision theory. Modelling uncertainty missing from the picture (if not from the book, as explained later by the authors as an argument against attaching the label Bayesian to the field). Also referring to Henri Poincaré for the origination of the prior vs posterior uncertainty about a mathematical quantity. Followed by early works from the Russian school of probability, somewhat ignored until the machine-learning revolution and a 2012 NIPS workshop organised by the authors. (I participated to a follow-up workshop at NIPS 2015.)

In this nicely written section, I have an objection to the authors’ argument that a frequentist, as opposed to a Bayesian, “has the loss function in mind from the outset” (p.9), since the loss function is logically inseparable from the prior and considered from the onset. I also like very much the conclusion to that introduction, namely that the main messages (from the book) are that (verbatim)

  • classical methods are probabilist (p.10)
  • numerical methods are autonomous agents (p.11)
  • numerics should not be random (if not a rejection of the concept of Monte Carlo methods, p.1, but probabilistic numerics being opposed to stochastic numerics, p.67)
  • numerics must report calibrated uncertainty (p.12)
  • imprecise computation is to be embraced (p.12)
  • probabilistic numerics consolidates numerical computation and statistical inference (p.13)
  • probabilistic numerical algorithms are already adding value (p.13)
  • pipelines of computation demand harmonisation

“Is it still reasonable to labour under computational constraints conceived in the 1940s?” (p.113)

“rather than being equally good for any number of dimensions, Monte Carlo is perhaps better thought of as being equally bad” (p.110)

Chapter I is a 40p infodump (!) on mathematical concepts needed for the following parts. Chapter II is about integration, opposing again PN and Monte Carlo (with strange remark that MCMC does not achieve √N convergence rate, p.72). In the sense that the later is frequentist in that it does not use a prior [unless considering a limiting improper version as in Section 12.2, an intriguing concept in this setup as I wonder whether or not improper priors can at all be contemplated] on the object of interest and hence that the stochasticity does not reflect uncertainty but rather the impact of the simulated sample. Advocating Bayesian quadrature (with some weird convergence graphs exhibiting a high variability with the number of iterations that apparently is not discussed) and bringing in the fascinating perspective of model choice in that framework (leading to compute a posterior probability for each model!). Being evidently biased towards Monte Carlo, I find the opposition in Chapter 12 unnecessarily antagonistic, while presenting Monte Carlo methods as a form of minimax solution, the more because quasi-Monte Carlo methods are hardly discussed (or dismissed). As illustrated by the following picture (p.115) and the above quotes. (And I won’t even go into the absurdity of §12.3 trashing pseudo-random generators as “painfully dumb”.)

Chapter III is a sort of dual of Chapter II for linear algebra numerics, primarily solving linear equations by Gaussian solvers, which introduces new concepts like Krylov sequences, although it sounds quite specific (for an outsider like me). Chapters IV and V deal with the more ambitious prospect of optimisation. Reconsidering classics and expanding into Bayesian optimisation, using Gaussian process priors and defining specific loss functions. Bringing in a strong link with machine learning tools and goals. [citation typo on p.277]. Chapter VII addresses the resolution of ODEs by a Bayesian state space model representation and (again!) Gaussian processes. Reaching to mentioning inverse problems and offering a short finale on prospective steps for interested readers.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

signed mixtures [X’ed]

Posted in Books, Kids, Statistics with tags , , , , , , , , on March 26, 2023 by xi'an

Following a question on X validated, the hypoexponential distribution, I came across (for the second time) a realistic example of a mixture (of exponentials) whose density wrote as a signed mixture, i.e. involving both negative and positive weights (with sum still equal to one). Namely,

\displaystyle f(x)=\sum_i^d \lambda_i e^{-\lambda_ix}\prod_{j=1,i\neq j}^{d}\frac{\lambda_j}{\lambda_j-\lambda_i}\quad x,\lambda_j>0

representing the density of a sum of d Exponential variates. The above is only well-defined when all rates differ, while a more generic definition involving matrix exponentiation exists. But the case when (only) two rates are equal can rather straightforwardly be derived by a direct application of L’Hospital rule, which my friend George considered as the number one calculus rule!

ISBA 2021.1

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , on June 29, 2021 by xi'an

An infinite (mixture) session was truly the first one I could attend on Day 1, as a heap of unexpected last minute issues kept me busy or on hedge for the beginning of the day (if not preventing me from a dawn dip in Calanque de Morgiou). Using the CIRM video system for zoom talked required more preparation than I had thought and we made it barely in time for the first session, while I had to store zoom links for all speakers present in Luminy.  Plus allocate sessions to the rooms provided by CIRM, twice since there was a mishap with the other workshop present at CIRM. And reassuring speakers, made anxious by the absence of a clear schedule. Chairing the second ABC session was also a tense moment, from checking every speaker could connect and share slides, to ensuring they kept on schedule (and they did on both!, ta’), to checking for questions at the end. Spotting a possible connection between Takuo Mastubara’s Stein’s approximation for in the ABC setup and a related paper by Liu and Lee I had read just a few days ago. Alas, it was too early to relax as an inverter in the CIRM room burned and led to a local power failure. Fortunately this was restored prior to the mixture session! (As several boars were spotted on the campus yesternight, I hope no tragic encounter happens before the end of the meeting!!!) So the mixture session proposed new visions on infering K, the number of components, some of which reminded me of… my first talk at CIRM where I was trying to get rid of empty components at each MCMC step, albeit in a much more rudimentary way obviously. And later had the wonderful surprise of hearing Xiao-Li’s lecture start by an excerpt from Car Talk, the hilarious Sunday morning radio talk-show about the art of used car maintenance on National Public Radio (NPR) that George Casella could not miss (and where a letter he wrote them about a mistaken probability computation was mentioned!). The final session of the day was an invited ABC session I chaired (after being exfiltrated from the CIRM dinner table!) with Kate Lee, Ryan Giordano, and Julien Stoehr as speakers. Besides Julien’s talk on our Gibbs-ABC paper, both other talks shared a concern with the frequentist properties of the ABC posterior, either to be used as a control tool or as a faster assessment of the variability of the (Monte Carlo) ABC output.