## golden Bayesian!

Posted in Statistics with tags , , , , , , , , , on November 11, 2017 by xi'an

## computational methods for numerical analysis with R [book review]

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , on October 31, 2017 by xi'an

This is a book by James P. Howard, II, I received from CRC Press for review in CHANCE. (As usual, the customary warning applies: most of this blog post will appear later in my book review column in CHANCE.) It consists in a traditional introduction to numerical analysis with backup from R codes and packages. The early chapters are setting the scenery, from basics on R to notions of numerical errors, before moving to linear algebra, interpolation, optimisation, integration, differentiation, and ODEs. The book comes with a package cmna that reproduces algorithms and testing. While I do not find much originality in the book, given its adherence to simple resolutions of the above topics, I could nonetheless use it for an elementary course in our first year classes. With maybe the exception of the linear algebra chapter that I did not find very helpful.

“…you can have a solution fast, cheap, or correct, provided you only pick two.” (p.27)

The (minor) issue I have with the book and that a potential mathematically keen student could face as well is that there is little in the way of justifying a particular approach to a given numerical problem (as opposed to others) and in characterising the limitations and failures of the presented methods (although this happens from time to time as e.g. for gradient descent, p.191). [Seeping in my Gallic “mal-être”, I am prone to over-criticise methods during classing, to the (increased) despair of my students!, but I also feel that avoiding over-rosy presentations is a good way to avoid later disappointments or even disasters.] In the case of this book, finding [more] ways of detecting would-be disasters would have been nice.

An uninteresting and highly idiosyncratic side comment is that the author preferred the French style for long division to the American one, reminding me of my first exposure to the latter, a few months ago! Another comment from a statistician is that mentioning time series inter- or extra-polation without a statistical model sounds close to anathema! And makes extrapolation a weapon without a cause.

“…we know, a priori, exactly how long the [simulated annealing] process will take since it is a function of the temperature and the cooling rate.” (p.199)

Unsurprisingly, the section on Monte Carlo integration is disappointing for a statistician/probabilistic numericist like me,  as it fails to give a complete enough picture of the methodology. All simulations seem to proceed there from a large enough hypercube. And recommending the “fantastic” (p.171) R function integrate as a default is scary, given the ability of the selected integration bounds to misled its users. Similarly, I feel that the simulated annealing section is not providing enough of a cautionary tale about the highly sensitive impact of cooling rates and absolute temperatures. It is only through the raw output of the algorithm applied to the travelling salesman problem that the novice reader can perceive the impact of some of these factors. (The acceptance bound on the jump (6.9) is incidentally wrongly called a probability on p.199, since it can take values larger than one.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

## what makes variables randoms [book review]

Posted in Books, Mountains, Statistics with tags , , , , , , on July 19, 2017 by xi'an

When the goal of a book is to make measure theoretic probability available to applied researchers for conducting their research, I cannot but applaud! Peter Veazie’s goal of writing “a brief text that provides a basic conceptual introduction to measure theory” (p.4) is hence most commendable. Before reading What makes variables random, I was uncertain how this could be achieved with a limited calculus background, given the difficulties met by our third year maths students. After reading the book, I am even less certain this is feasible!

“…it is the data generating process that makes the variables random and not the data.”

Chapter 2 is about basic notions of set theory. Chapter 3 defines measurable sets and measurable functions and integrals against a given measure μ as

$\sup_\pi \sum_{A\in\pi}\inf_{\omega\in A} f(\omega)\mu(A)$

which I find particularly unnatural compared with the definition through simple functions (esp. because it does not tell how to handle 0x∞). The ensuing discussion shows the limitation of the exercise in that the definition is only explained for finite sets (since the notion of a partition achieving the supremum on page 29 is otherwise meaningless). A generic problem with the book, in that most examples in the probability section relate to discrete settings (see the discussion of the power set p.66). I also did not see a justification as to why measurable functions enjoy well-defined integrals in the above sense. All in all, to see less than ten pages allocated to measure theory per se is rather staggering! For instance,

$\int_A f\text{d}\mu$

does not appear to be defined at all.

“…the mathematical probability theory underlying our analyses is just mathematics…”

Chapter 4 moves to probability measures. It distinguishes between objective (or frequentist) and subjective measures, which is of course open to diverse interpretations. And the definition of a conditional measure is the traditional one, conditional on a set rather than on a σ-algebra. Surprisingly as this is in my opinion one major reason for using measures in probability theory. And avoids unpleasant issues such as Bertrand’s paradox. While random variables are defined in the standard sense of real valued measurable functions, I did not see a definition of a continuous random variables or of the Lebesgue measure. And there are only a few lines (p.48) about the notion of expectation, which is so central to measure-theoretic probability as to provide a way of entry into measure theory! Progressing further, the σ-algebra induced by a random variable is defined as a partition (p.52), a particularly obscure notion for continuous rv’s. When the conditional density of one random variable given the realisation of another is finally introduced (p.63), as an expectation reconciling with the set-wise definition of conditional probabilities, it is in a fairly convoluted way that I fear will scare newcomers out of their wit. Since it relies on a sequence of nested sets with positive measure, implying an underlying topology and the like, which somewhat shows the impossibility of the overall task…

“In the Bayesian analysis, the likelihood provides meaning to the posterior.”

Statistics is hurriedly introduced in a short section at the end of Chapter 4, assuming the notion of likelihood is already known by the readers. But nitpicking (p.65) at the representation of the terms in the log-likelihood as depending on an unspecified parameter value θ [not to be confused with the data-generating value of θ, which does not appear clearly in this section]. Section that manages to include arcane remarks distinguishing maximum likelihood estimation from Bayesian analysis, all this within a page! (Nowhere is the Bayesian perspective clearly defined.)

“We should no more perform an analysis clustered by state than we would cluster by age, income, or other random variable.”

The last part of the book is about probabilistic models, drawing a distinction between data generating process models and data models (p.89), by which the author means the hypothesised probabilistic model versus the empirical or bootstrap distribution. An interesting way to relate to the main thread, except that the convergence of the data distribution to the data generating process model cannot be established at this level. And hence that the very nature of bootstrap may be lost on the reader. A second and final chapter covers some common or vexing problems and the author’s approach to them. Revolving around standard errors, fixed and random effects. The distinction between standard deviation (“a mathematical property of a probability distribution”) and standard error (“representation of variation due to a data generating process”) that is followed for several pages seems to boil down to a possible (and likely) model mis-specification. The chapter also contains an extensive discussion of notations, like indexes (or indicators), which seems a strange focus esp. at this location in the book. Over 15 pages! (Furthermore, I find quite confusing that a set of indices is denoted there by the double barred I, usually employed for the indicator function.)

“…the reader will probably observe the conspicuous absence of a time-honoured topic in calculus courses, the “Riemann integral”… Only the stubborn conservatism of academic tradition could freeze it into a regular part of the curriculum, long after it had outlived its historical importance.” Jean Dieudonné, Foundations of Modern Analysis

In conclusion, I do not see the point of this book, from its insistence on measure theory that never concretises for lack of mathematical material to an absence of convincing examples as to why this is useful for the applied researcher, to the intended audience which is expected to already quite a lot about probability and statistics, to a final meandering around linear models that seems at odds with the remainder of What makes variables random, without providing an answer to this question. Or to the more relevant one of why Lebesgue integration is preferable to Riemann integration. (Not that there does not exist convincing replies to this question!)

## errors, blunders, and lies [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on July 9, 2017 by xi'an

This new book by David Salsburg is the first one in the ASA-CRC Series on Statistical Reasoning in Science and Society. Which explains why I heard about it both from CRC Press [as a suggested material for a review in CHANCE] and from the ASA [as mass emailing]. The name of the author did not ring a bell until I saw the line about his earlier The Lady Tasting Tea book,  a best-seller in the category of “soft [meaning math- and formula-free] introduction to Statistics through picturesque characters”. Which I did not read either [but Bob Carpenter did].

The current book is of the same flavour, albeit with some maths formulas [each preceded by a lengthy apology for using maths and symbols]. The topic is the one advertised in the title, covering statistical errors and the way to take advantage of them, model mis-specification and robustness, and the detection of biases and data massaging. I read the short book in one quick go, waiting for the results of the French Legislative elections, and found no particular appeal in the litany of examples, historical entries, pitfalls, and models I feel I have already read so many times in the story-telling approach to statistics. (Naked Statistics comes to mind.)

It is not that there anything terrible with the book, which is partly based on the author’s own experience in a pharmaceutical company, but it does not seem to bring out any novelty for engaging into the study of statistics or for handling data in a more rational fashion. And I do not see which portion of the readership is targeted by the book, which is too allusive for academics and too academic for a general audience, who is not necessarily fascinated by the finer details of the history (and stories) of the field. As in The Lady Tasting Tea, the chapters constitute a collection of vignettes, rather than a coherent discourse leading to a theory or defending an overall argument. Some chapters are rather poor, like the initial chapter explaining the distinction between lies, blunders, and errors through the story of the measure of the distance from Earth to Sun by observing the transit of Venus, not that the story is uninteresting, far from it!, but I find it lacking in connecting with statistics [e.g., the meaning of a “correct” observation is never explained]. Or the chapter on the Princeton robustness study, where little is explained about the nature of the wrong distributions, which end up as specific contaminations impacting mostly the variance. And some examples are hardly convincing, like those on text analysis (Chapters 13, 14, 15), where there is little backup for using Benford’s law on such short datasets.  Big data is understood only under the focus of large p, small n, which is small data in my opinion! (Not to mention a minor crime de lèse-majesté in calling Pierre-Simon Laplace Simon-Pierre Laplace! I would also have left the Marquis de aside as this title came to him during the Bourbon Restauration, despite him having served Napoléon for his entire reign.) And, as mentioned above, the book contains apologetic mathematics, which never cease to annoy me since apologies are not needed. While the maths formulas are needed.

## a concise introduction to statistical inference [book review]

Posted in Statistics with tags , , , , , , , , , , on February 16, 2017 by xi'an

[Just to warn readers and avoid emails about Xi’an plagiarising Christian!, this book was sent to me by CRC Press for a review. To be published in CHANCE.]

This is an introduction to statistical inference. And with 180 pages, it indeed is concise! I could actually stop the review at this point as a concise review of a concise introduction to statistical inference, as I do not find much originality in this introduction, intended for “mathematically sophisticated first-time student of statistics”. Although sophistication is in the eye of the sophist, of course, as this book has margin symbols in the guise of integrals to warn of section using “differential or integral calculus” and a remark that the book is still accessible without calculus… (Integral calculus as in Riemann integrals, not Lebesgue integrals, mind you!) It even includes appendices with the Greek alphabet, summation notations, and exponential/logarithms.

“In statistics we often bypass the probability model altogether and simply specify the random variable directly. In fact, there is a result (that we won’t cover in detail) that tells us that, for any random variable, we can find an appropriate probability model.” (p.17)

Given its limited mathematical requirements, the book does not get very far in the probabilistic background of statistics methods, which makes the corresponding chapter not particularly helpful as opposed to a prerequisite on probability basics. Since not much can be proven without “all that complicated stuff about for any ε>0” (p.29). And makes defining correctly notions like the Central Limit Theorem impossible. For instance, Chebychev’s inequality comes within a list of admitted results. There is no major mistake in the chapter, even though mentioning that two correlated Normal variables are jointly Normal (p.27) is inexact.

“The power of a test is the probability that you do not reject a null that is in fact correct.” (p.120)

Most of the book follows the same pattern as other textbooks at that level, covering inference on a mean and a probability, confidence intervals, hypothesis testing, p-values, and linear regression. With some words of caution about the interpretation of p-values. (And the unfortunate inversion of the interpretation of power above.) Even mentioning the Cult [of Significance] I reviewed a while ago.

Given all that, the final chapter comes as a surprise, being about Bayesian inference! Which should make me rejoice, obviously, but I remain skeptical of introducing the concept to readers with so little mathematical background. And hence a very shaky understanding of a notion like conditional distributions. (Which reminds me of repeated occurrences on X validated when newcomers hope to bypass textbooks and courses to grasp the meaning of posteriors and such. Like when asking why Bayes Theorem does not apply for expectations.) I can feel the enthusiasm of the author for this perspective and it may diffuse to some readers, but apart from being aware of the approach, I wonder how much they carry away from this brief (decent) exposure. The chapter borrows from Lee (2012, 4th edition) and from Berger (1985) for the decision-theoretic part. The limitations of the exercise are shown for hypothesis testing (or comparison) by the need to restrict the parameter space to two possible values. And for decision making. Similarly, introducing improper priors and the likelihood principle [distinguished there from the law of likelihood] is likely to get over the head of most readers and clashes with the level of the previous chapters. (And I do not think this is the most efficient way to argue in favour of a Bayesian approach to the problem of statistical inference: I have now dropped all references to the likelihood principle from my lectures. Not because of the controversy, but simply because the students do not get it.) By the end of the chapter, it is unclear a neophyte would be able to spell out how one could specify a prior for one of the problems processed in the earlier chapters. The appendix on de Finetti’s formalism on personal probabilities is very much unlikely to help in this regard. While it sounds so far beyond the level of the remainder of the book.

## another wrong entry

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on June 27, 2016 by xi'an

Quite a coincidence! I just came across another bug in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Already discussed here and on X validated. While working with one participant to the post-ISBA softshop, we were looking for efficient approaches to simulating correlation matrices and came [by Google] across the above R code associated with a 3×3 correlation matrix, which misses the additional constraint that the determinant must be positive. As shown e.g. by the example

> eigen(matrix(c(1,-.8,.7,-.8,1,.6,.7,.6,1),ncol=3))
\$values
[1] 1.8169834 1.5861960 -0.4031794


having all correlations between -1 and 1 is not enough. Just. Not. Enough.

## the worst possible proof [X’ed]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on July 18, 2015 by xi'an

Another surreal experience thanks to X validated! A user of the forum recently asked for an explanation of the above proof in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. No wonder this user was puzzled: the explanation makes no sense outside the univariate case… It is hard to fathom why on Earth the author would resort to this convoluted approach to conclude about the posterior conditional distribution being a normal centred at the least square estimate and with σ²X’X as precision matrix. Presumably, he has a poor opinion of the degree of matrix algebra numeracy of his readers [and thus should abstain from establishing the result]. As it seems unrealistic to postulate that the author is himself confused about matrix algebra, given his MSc in Statistics [the footnote ² seen above after “appropriately” acknowledges that “technically we cannot divide by” the matrix, but it goes on to suggest multiplying the numerator by the matrix

$(X^\text{T}X)^{-1} (X^\text{T}X)$

which does not make sense either, unless one introduces the trace tr(.) operator, presumably out of reach for most readers]. And this part of the explanation is unnecessarily confusing in that a basic matrix manipulation leads to the result. Or even simpler, a reference to Pythagoras’  theorem.