Mathematical underpinnings of Analytics (theory and applications)
“Today, a week or two spent reading Jaynes’ book can be a life-changing experience.” (p.8)
I received this book by Peter Grindrod, Mathematical underpinnings of Analytics (theory and applications), from Oxford University Press, quite a while ago. (Not that long ago since the book got published in 2015.) As a book for review for CHANCE. And let it sit on my desk and in my travel bag for the same while as it was unclear to me that it was connected with Statistics and CHANCE. What is [are?!] analytics?! I did not find much of a definition of analytics when I at last opened the book, and even less mentions of statistics or machine-learning, but Wikipedia told me the following:
“Analytics is a multidimensional discipline. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology.”
Barring the absurdity of speaking of a “multidimensional discipline” [and even worse of linking with the mathematical notion of dimension!], this tells me analytics is a mix of data analysis and decision making. Hence relying on (some) statistics. Fine.
“Perhaps in ten years, time, the mathematics of behavioural analytics will be common place: every mathematics department will be doing some of it.”(p.10)
First, and to start with some positive words (!), a book that quotes both Friedrich Nietzsche and Patti Smith cannot get everything wrong! (Of course, including a most likely apocryphal quote from the now late Yogi Berra does not partake from this category!) Second, from a general perspective, I feel the book meanders its way through chapters towards a higher level of statistical consciousness, from graphs to clustering, to hidden Markov models, without precisely mentioning statistics or statistical model, while insisting very much upon Bayesian procedures and Bayesian thinking. Overall, I can relate to most items mentioned in Peter Grindrod’s book, but mostly by first reconstructing the notions behind. While I personally appreciate the distanced and often ironic tone of the book, reflecting upon the author’s experience in retail modelling, I am thus wondering at which audience Mathematical underpinnings of Analytics aims, for a practitioner would have a hard time jumping the gap between the concepts exposed therein and one’s practice, while a theoretician would require more formal and deeper entries on the topics broached by the book. I just doubt this entry will be enough to lead maths departments to adopt behavioural analytics as part of their curriculum…
“In applications, you have the data that you observe; you cannot change it.” (p.35)
The first two chapters are about random graphs and networks, with dynamic versions in Chapter 2. While I understand the mathematical concepts derived from such objects, I feel those chapters are missing on the important issues of model building and model estimation: I do not see how to conduct inference on such data by using only an eigenvalue decomposition. “Bayesian probability” (p.13) and Bayes’ theorem (p.22, p.41) are mentioned, albeit with no clear connection with statistical inference. (Typo: the expectation at the top of p.39 is improperly defined.)
“In a situation with many [independent] observations (…) an overall Bayes factor may be formed by multiplying all the individual ones together.” (p.118)
While the title of Chapter 3 is a puzzle (“Structure and responsiveness”), and most of the contents are about deterministic and randomised dynamical models (like Lorenz’ attractor), this chapter introduces bootstrap and Bayes Factors. The above quote got me puzzled until I wrote down what it means: it is true that one can express the Bayes factor, i.e., the ratio of marginal likelihoods, like a product of ratios of conditional marginals. Just like any ratio of joint densities. However, this is not particularly useful: when getting a new observation, there is not simple way of updating the current Bayes factor, I mean nothing simpler than computing the new marginals for the entire samples. (Unless one engages into a particle filter implementation, in which case approximations to the Bayes factors are indeed updated sequentially. I do not think this is what the author has in mind.) Another issue (and far from a philosophical one) with the introduction of the Bayes factor in the book: there is no warning of any sort about the use of improper priors in Bayes factors. All this makes me wonder whether the author is de facto using instead an alternate version of Bayes factor with plug-in estimates rather than integrals. Another entry that got me confused for a while, until I reached the Appendix (p.230) is the use of improper for both infinite mass and unnormalised densities. As well as hypothesis for model, or for component of a mixture.
“One should never naïvely apply EM-type methods. Just look at the data first.” (p.145)
Chapter 4 covers the EM algorithm(s), for mixture estimation and clustering, with applications to behavioural and consumer data. Nothing to comment except it is not Bayesian in the least. And still the chapter concludes with the personal remarks that “probability theory and especially Bayesian probability theory is often crowded out of our mathematics undergraduate courses” (p.145) and that “the rules of conditional probability may be derived from functional equations” (p.146).
“Personally, I greatly dislike artificial neural networks (…) What I object to most of all though is the appropriation of the clothes of human neural cognitive processing.” (p.165)
The following chapter is mistakenly entitled “Multiple hypothesis testing” when it actually addresses the simultaneous comparison of several models and the selection of the most appropriate one, instead of multiple testing in the more classical sense. When the author uses logistic or polytomous regression to separate classes (or hypotheses, as put in the book), the regression coefficient β is not handled in a Bayesian perspective but instead replaced by the ML estimator, which is fine per se but makes the insistence on all things Bayesian sound a bit too vocal. In the following chapter, the linear model is handled via a conjugate Gaussian prior on the regression coefficient, and a log-normal prior on the variance, but I did not spot a discussion on the use of the posterior to assess uncertainty in the estimation of the coefficients.
“Although genetic algorithms can be applied blindly (…) it is productive to apply the method sympathetically, in the light of prior knowledge.”(p.194)
In the final chapter, finite Markov chains are used to model customer behaviour, with a mention of hidden Markov models, along with a introduction to genetic algorithms as a generic optimisation method. Which is fine but leaves the reader wondering at the calibration of such algorithms. It also insists on Laplace succession rule as highly central to Bayesian statistics, even though “man books on probability theory try to ignore it” (p.234).
“What a strange thing probability is (…) So let us start probability theory all over again. Let us leave aside what we have learned up to now.” (p.220)
The book also contains a 20 pages appendix on the basics of Bayesian reasoning that sounds to me as being too much preaching and not technical enough for the purpose. Besides missing Bayes’ posthumous publication of the Essay by one year, a very venial sin!, I fear the appendix spends too much time on Bayes’ theorem (with classical examples like the prosecutor’s fallacy, the Monty Hall problem, and similar notions), and not enough on the use of posterior distributions as reflecting uncertainty about our estimation. (Note that the appendix sometimes refers to subsection numbers that do not appear in this version.)