Archive for machine learning

Privacy-preserving Computing [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on May 13, 2024 by xi'an

Privacy-preserving Computing for Big Data Analytics and AI, by Kai Chen and Qiang Yang, is a rather short 2024 CUP book translated from the 2022 Chinese version (by the authors).  It covers secret sharing, homomorphic encryption, oblivious transfer, garbled circuit, differential privacy, trusted execution environment, federated learning, privacy-preserving computing platforms, and case studies. The style is survey-like, meaning it often is too light for my liking, with too many lists of versions and extensions, and more importantly lacking in detail to rely (solely) on it for a course. At several times standing closer to a Wikipedia level introduction to a topic. For instance, the chapter on homomorphic encryption [Chap.5] does not connect with the (presumably narrow) picture I have of this method. And the chapter on differential privacy [Chap.6] does not get much further than Laplace and Gaussian randomization, as in eg the stochastic gradient perturbation of Abadi et al. (2016) the privacy requirement is hardly discussed. The chapter on federated leaning [Chap.8] is longer if not much more detailed, being based on a entire book on Federated learning whose Qiang Yang is the primary author. (With all figures in that chapter being reproduced from said book.)  The next chapter [Chap.9] describes to some extent several computing platforms that can be used for privacy purposes, such as FATE, CryptDB, MesaTEE, Conclave, and PrivPy, while the final one goes through case studies from different areas, but without enough depth to be truly formative for neophyte readers and students. Overall, too light for my liking.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

mostly MC [April]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 5, 2024 by xi'an

off to Bristol

Posted in Statistics with tags , , , , , , , , , , , , , on March 24, 2024 by xi'an

probabilistic numerics [book review]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , , on July 28, 2023 by xi'an

Probabilistic numerics: Computation as machine learning is a 2022 book by Philipp Henning, Michael Osborne, and Hans Kersting that was sent to me by CUP (upon my request and almost free of charge, as I had to pay custom charges, thanks to Brexit!). With the important message of bringing statistical tools to numerics. I remember Persi Diaconis calling for (such) actions in the 1980’s (and even reading a paper of his on the topic along with George Casella in Ithaca while waiting for his car to get serviced!).

From a purely aesthetic view point, the book reads well, offers a beautiful cover and sells for a quite reasonable price for an academic book. Plus it is associated with a website containing draft version of the book. Code, links to courses, research, conferences are also available there. Just a side remark that it enjoys very wide margins that may have encouraged an inflation of footnotes (but also exercises). Except when formulas get in the way (as e.g. on p.40).

The figure below is an excerpt from the introduction that sets the scene of probabilistic numerics involving algorithms as agents, gathering data and making decisions, with an obvious analogy with standard Bayesian decision theory. Modelling uncertainty missing from the picture (if not from the book, as explained later by the authors as an argument against attaching the label Bayesian to the field). Also referring to Henri Poincaré for the origination of the prior vs posterior uncertainty about a mathematical quantity. Followed by early works from the Russian school of probability, somewhat ignored until the machine-learning revolution and a 2012 NIPS workshop organised by the authors. (I participated to a follow-up workshop at NIPS 2015.)

In this nicely written section, I have an objection to the authors’ argument that a frequentist, as opposed to a Bayesian, “has the loss function in mind from the outset” (p.9), since the loss function is logically inseparable from the prior and considered from the onset. I also like very much the conclusion to that introduction, namely that the main messages (from the book) are that (verbatim)

  • classical methods are probabilist (p.10)
  • numerical methods are autonomous agents (p.11)
  • numerics should not be random (if not a rejection of the concept of Monte Carlo methods, p.1, but probabilistic numerics being opposed to stochastic numerics, p.67)
  • numerics must report calibrated uncertainty (p.12)
  • imprecise computation is to be embraced (p.12)
  • probabilistic numerics consolidates numerical computation and statistical inference (p.13)
  • probabilistic numerical algorithms are already adding value (p.13)
  • pipelines of computation demand harmonisation

“Is it still reasonable to labour under computational constraints conceived in the 1940s?” (p.113)

“rather than being equally good for any number of dimensions, Monte Carlo is perhaps better thought of as being equally bad” (p.110)

Chapter I is a 40p infodump (!) on mathematical concepts needed for the following parts. Chapter II is about integration, opposing again PN and Monte Carlo (with strange remark that MCMC does not achieve √N convergence rate, p.72). In the sense that the later is frequentist in that it does not use a prior [unless considering a limiting improper version as in Section 12.2, an intriguing concept in this setup as I wonder whether or not improper priors can at all be contemplated] on the object of interest and hence that the stochasticity does not reflect uncertainty but rather the impact of the simulated sample. Advocating Bayesian quadrature (with some weird convergence graphs exhibiting a high variability with the number of iterations that apparently is not discussed) and bringing in the fascinating perspective of model choice in that framework (leading to compute a posterior probability for each model!). Being evidently biased towards Monte Carlo, I find the opposition in Chapter 12 unnecessarily antagonistic, while presenting Monte Carlo methods as a form of minimax solution, the more because quasi-Monte Carlo methods are hardly discussed (or dismissed). As illustrated by the following picture (p.115) and the above quotes. (And I won’t even go into the absurdity of §12.3 trashing pseudo-random generators as “painfully dumb”.)

Chapter III is a sort of dual of Chapter II for linear algebra numerics, primarily solving linear equations by Gaussian solvers, which introduces new concepts like Krylov sequences, although it sounds quite specific (for an outsider like me). Chapters IV and V deal with the more ambitious prospect of optimisation. Reconsidering classics and expanding into Bayesian optimisation, using Gaussian process priors and defining specific loss functions. Bringing in a strong link with machine learning tools and goals. [citation typo on p.277]. Chapter VII addresses the resolution of ODEs by a Bayesian state space model representation and (again!) Gaussian processes. Reaching to mentioning inverse problems and offering a short finale on prospective steps for interested readers.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

statistical modeling with R [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 10, 2023 by xi'an

Statistical Modeling with R (A dual frequentist and Bayesian approach for life scientists) is a recent book written by Pablo Inchausti, from Uruguay. In a highly personal and congenial style (witness the preface), with references to (fiction) books that enticed me to buy them. The book was sent to me by the JASA book editor for review and I went through the whole of it during my flight back from Jeddah. [Disclaimer about potential self-plagiarism: this post or a likely edited version of it will eventually appear in JASA. If not CHANCE, for once.]

The very first sentence (after the preface) quotes my late friend Steve Fienberg, which is definitely starting on the right foot. The exposition of the motivations for writing the book is quite convincing, with more emphasis than usual put on the notion and limitations of modeling. The discourse is overall inspirational and contains many relevant remarks and links that make it worth reading it as a whole. While heavily connected with a few R packages like fitdist, fitistrplus, brms (a  front for Stan), glm, glmer, the book is wisely bypassing the perilous reef of recalling R bases. Similarly for the foundations of probability and statistics. While lacking in formal definitions, in my opinion, it reads well enough to somehow compensate for this very lack. I also appreciate the coherent and throughout continuation of the parallel description of Bayesian and non-Bayesian analyses, an attempt that often too often quickly disappear in other books. (As an aside, note that hardly anyone claims to be a frequentist, except maybe Deborah Mayo.) A new model is almost invariably backed by a new dataset, if a few being somewhat inappropriate as in the mammal sleep patterns of Chapter 5. Or in Fig. 6.1.

Given that the main motivation for the book (when compared with references like BDA) is heavily towards the practical implementation of statistical modelling via R packages, it is inevitable that a large fraction of Statistical Modeling with R is spent on the analysis of R outputs, even though it sometimes feels a wee bit too heavy for yours truly.  The R screen-copies are however produced in moderate quantity and size, even though the variations in typography/fonts (at least on my copy?!) may prove confusing. Obviously the high (explosive?) distinction between regression models may eventually prove challenging for the novice reader. The specific issue of prior input (or “defining priors”) is briefly addressed in a non-chapter (p.323), although mentions are made throughout preceding chapters. I note the nice appearance of hierarchical models and experimental designs towards the end, but would have appreciated some discussions on missing topics such as time series, causality, connections with machine learning, non-parametrics, model misspecification. As an aside, I appreciated being reminded about the apocryphal nature of Ockham’s much cited quotePluralitas non est ponenda sine necessitate“.

Typo Jeffries found in Fig. 2.1, along with a rather sketchy representation of the history of both frequentist and Bayesian statistics. And Jon Wakefield’s book (with related purpose of presenting both versions of parametric inference) was mistakenly entered as Wakenfield’s in the bibliography file. Some repetitions occur. I do not like the use of the equivalence symbol ≈ for proportionality. And I found two occurrences of the unavoidable “the the” typo (p.174 and p.422). I also had trouble with some sentences like “long-run, hypothetical distribution of parameter estimates known as the sampling distribution” (p.27), “maximum likelihood estimates [being] sufficient” (p.28), “Jeffreys’ (1939) conjugate priors” [which were introduced by Raiffa and Schlaifer] (p.35), “A posteriori tests in frequentist models” (p.130), “exponential families [having] limited practical implications for non-statisticians” (p.190), “choice of priors being correct” (p.339), or calling MCMC sample terms “estimates” (p.42), and issues with some repetitions, missing indices for acronyms, packages, datasets, but did not bemoan the lack homework sections (beyond suggesting new datasets for analysis).

A problematic MCMC entry is found when calibrating the choice of the Metropolis-Hastings proposal towards avoiding negative values “that will generate an error when calculating the log-likelihood” (p.43) since it suggests proposed values should not exceed the support of the posterior (and indicates a poor coding of the log-likelihood!). I also find the motivation for the full conditional decomposition behind the Gibbs sampler (p.47) unnecessarily confusing. (And automatically having a Metropolis-Hastings step within Gibbs as on Fig. 3.9 brings another magnitude of confusion.) The Bayes factor section is very terse. The derivation of the Kullback-Leibler representation (7.3) as an expected log likelihood ratio seems to be missing a reference measure. Of course, seeing a detailed coverage of DIC (Section 7.4) did not suit me either, even though the issue with mixtures was alluded to (with no detail whatsoever). The Nelder presentation of the generalised linear models felt somewhat antiquated, since the addition of the scale factor a(φ) sounds over-parameterized.

But those are minor quibble in relation to a book that should attract curious minds of various background knowledge and expertise in statistics, as well as work nicely to support an enthusiastic teacher of statistical modelling. I thus recommend this book most enthusiastically.