## estimation exam [best of]

Posted in Books, Kids, Statistics with tags , , , , , , , , on January 29, 2019 by xi'an

Yesterday, I received a few copies of our CRC Press Handbook of Mixture Analysis, while grading my mathematical statistics exam 160 copies. Among the few goodies, I noticed the always popular magical equality

E[1/T]=1/E[T]

that must have been used in so many homeworks and exam handouts by now that it should become a folk theorem. More innovative is the argument that E[1/min{X¹,X²,…}] does not exist for iid U(0,θ) because it is the minimum and thus is the only one among the order statistics with the ability to touch zero. Another universal shortcut was the completeness conclusion that when the integral

$\int_0^\theta \varphi(x) x^k \text{d}x$

was zero for all θ’s then φ had to be equal to zero with no further argument (only one student thought to take the derivative). Plus a growing inability in the cohort to differentiate even simple functions… (At least, most students got the bootstrap right, as exemplified by their R code.) And three stars to the student who thought of completely gluing his anonymisation tag, on every one of his five sheets!, making identification indeed impossible, except by elimination of the 159 other names.

## a book and two chapters on mixtures

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 8, 2019 by xi'an

The Handbook of Mixture Analysis is now out! After a few years of planning, contacts, meetings, discussions about notations, interactions with authors, further interactions with late authors, repeating editing towards homogenisation, and a final professional edit last summer, this collection of nineteen chapters involved thirty-five contributors. I am grateful to all participants to this piece of work, especially to Sylvia Früwirth-Schnatter for being a driving force in the project and for achieving a much higher degree of homogeneity in the book than I expected. I would also like to thank Rob Calver and Lara Spieker of CRC Press for their boundless patience through the many missed deadlines and their overall support.

Two chapters which I co-authored are now available as arXived documents:

along other chapters

## surprises in probability [book review]

Posted in Books, Statistics, Travel with tags , , , , , , , , , on November 20, 2018 by xi'an

A very short book (128 pages, but with a very high price!) I received from CRC Press is Henk Tijms’ Surprises in Probability (Seventeen Short Stories). Henk Tijms is an emeritus professor of econometrics at the Vrije University in Amsterdam and he wrote these seventeen pieces either for the Dutch Statistical Society magazine or for a blog he ran for the NYt. (The video of A Night in Casablanca above is only connected to this blog through Chico mimicking the word surprise as soup+rice.)

The author mentions that the book can be useful for teachers and indeed this is a collection of surprising probability results, surprising in the sense that the numerical probabilities are not necessarily intuitive. Most illustrations involve betting of one sort or another,  with only basic (combinatorial) probability distributions involved. Readers should not worry about even this basic probability background since most statements are exposed without a proof. Most examples are very classical, from the prisoner’s problem, to the Monty Hall paradox, to the birthday problem, to Benford’s distribution of digits, to gambler’s ruin, gambler’s fallacy, and the St Petersbourg paradox, to the secretary’s problem and stopping rules. The most advanced notion is the one of (finite state) Markov chains. As martingales are only mentionned in connection with pseudo-probabilist schemes for winning the lottery. For which (our very own!) Jeff Rosenthal makes an appearance, thanks to his uncovering of the Ontario Lottery scam!

“In no other branch of mathematics is it so easy for experts to blunder as in probability theory.”  Martin Gardner

A few stories have entries about Bayesian statistics, with mentions made of the O.J. Simpson, Sally Clark and Lucia de Berk miscarriages of justice, although these mentions make the connection most tenuous. Simulation is also mentioned as a manner of achieving approximations to more complex probabilities. But not to the point of discussing surprises about simulation, which could have been the case with the simulation of rare events.

Ten most beautiful probability formulas (Story 10) reminded me of Ian Steward 17 formulas that changed the World. Obviously at another scale and in a much less convincing way. To wit, the Normal (or Gauss) density, Bayes’ formula, the gambler’s ruin formula, the squared-root formula (meaning standard deviation decreases as √n), Kelly’s betting formula (?), the asymptotic law of distribution of prime numbers (??), another squared-root formula for the one-dimensional random walk, the newsboy formula (?), the Pollaczek-Khintchine formula (?), and the waiting-time formula. I am not sure I would have included any of these…

All in all this is a nice if unsurprising database for illustrations and possibly exercises in elementary probability courses, although it will require some work from the instructor to link the statements to their proof. As one would expect from blog entries. But this makes for a nice reading, especially while traveling and I hope some fellow traveler will pick the book from where I left it in Mexico City airport.

## ABC in print

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on September 5, 2018 by xi'an

The CRC Press Handbook of ABC is now out, after a rather long delay [the first version of our model choice chapter was written in 2015!] due to some late contributors Which is why I did not spot it at JSM 2018. As announced a few weeks ago, our Handbook of Mixture Analysis is soon to be published as well. (Not that I necessarily advocate the individual purchase of these costly volumes!, especially given most chapters are available on-line.)

## Handbook of Mixture Analysis [cover]

Posted in Books, Statistics, University life with tags , , , , , , , , on August 15, 2018 by xi'an

On the occasion of my talk at JSM2018, CRC Press sent me the cover of our incoming handbook on mixture analysis, courtesy of Rob Calver who managed to get it to me on very short notice! We are about ready to send the manuscript to CRC Press and hopefully the volume will get published pretty soon. It would have been better to have it ready for JSM2018, but we editors got delayed by a few months for the usual reasons.

## computational methods for numerical analysis with R [book review]

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , on October 31, 2017 by xi'an

This is a book by James P. Howard, II, I received from CRC Press for review in CHANCE. (As usual, the customary warning applies: most of this blog post will appear later in my book review column in CHANCE.) It consists in a traditional introduction to numerical analysis with backup from R codes and packages. The early chapters are setting the scenery, from basics on R to notions of numerical errors, before moving to linear algebra, interpolation, optimisation, integration, differentiation, and ODEs. The book comes with a package cmna that reproduces algorithms and testing. While I do not find much originality in the book, given its adherence to simple resolutions of the above topics, I could nonetheless use it for an elementary course in our first year classes. With maybe the exception of the linear algebra chapter that I did not find very helpful.

“…you can have a solution fast, cheap, or correct, provided you only pick two.” (p.27)

The (minor) issue I have with the book and that a potential mathematically keen student could face as well is that there is little in the way of justifying a particular approach to a given numerical problem (as opposed to others) and in characterising the limitations and failures of the presented methods (although this happens from time to time as e.g. for gradient descent, p.191). [Seeping in my Gallic “mal-être”, I am prone to over-criticise methods during classing, to the (increased) despair of my students!, but I also feel that avoiding over-rosy presentations is a good way to avoid later disappointments or even disasters.] In the case of this book, finding [more] ways of detecting would-be disasters would have been nice.

An uninteresting and highly idiosyncratic side comment is that the author preferred the French style for long division to the American one, reminding me of my first exposure to the latter, a few months ago! Another comment from a statistician is that mentioning time series inter- or extra-polation without a statistical model sounds close to anathema! And makes extrapolation a weapon without a cause.

“…we know, a priori, exactly how long the [simulated annealing] process will take since it is a function of the temperature and the cooling rate.” (p.199)

Unsurprisingly, the section on Monte Carlo integration is disappointing for a statistician/probabilistic numericist like me,  as it fails to give a complete enough picture of the methodology. All simulations seem to proceed there from a large enough hypercube. And recommending the “fantastic” (p.171) R function integrate as a default is scary, given the ability of the selected integration bounds to misled its users. Similarly, I feel that the simulated annealing section is not providing enough of a cautionary tale about the highly sensitive impact of cooling rates and absolute temperatures. It is only through the raw output of the algorithm applied to the travelling salesman problem that the novice reader can perceive the impact of some of these factors. (The acceptance bound on the jump (6.9) is incidentally wrongly called a probability on p.199, since it can take values larger than one.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

## errors, blunders, and lies [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on July 9, 2017 by xi'an

This new book by David Salsburg is the first one in the ASA-CRC Series on Statistical Reasoning in Science and Society. Which explains why I heard about it both from CRC Press [as a suggested material for a review in CHANCE] and from the ASA [as mass emailing]. The name of the author did not ring a bell until I saw the line about his earlier The Lady Tasting Tea book,  a best-seller in the category of “soft [meaning math- and formula-free] introduction to Statistics through picturesque characters”. Which I did not read either [but Bob Carpenter did].

The current book is of the same flavour, albeit with some maths formulas [each preceded by a lengthy apology for using maths and symbols]. The topic is the one advertised in the title, covering statistical errors and the way to take advantage of them, model mis-specification and robustness, and the detection of biases and data massaging. I read the short book in one quick go, waiting for the results of the French Legislative elections, and found no particular appeal in the litany of examples, historical entries, pitfalls, and models I feel I have already read so many times in the story-telling approach to statistics. (Naked Statistics comes to mind.)

It is not that there anything terrible with the book, which is partly based on the author’s own experience in a pharmaceutical company, but it does not seem to bring out any novelty for engaging into the study of statistics or for handling data in a more rational fashion. And I do not see which portion of the readership is targeted by the book, which is too allusive for academics and too academic for a general audience, who is not necessarily fascinated by the finer details of the history (and stories) of the field. As in The Lady Tasting Tea, the chapters constitute a collection of vignettes, rather than a coherent discourse leading to a theory or defending an overall argument. Some chapters are rather poor, like the initial chapter explaining the distinction between lies, blunders, and errors through the story of the measure of the distance from Earth to Sun by observing the transit of Venus, not that the story is uninteresting, far from it!, but I find it lacking in connecting with statistics [e.g., the meaning of a “correct” observation is never explained]. Or the chapter on the Princeton robustness study, where little is explained about the nature of the wrong distributions, which end up as specific contaminations impacting mostly the variance. And some examples are hardly convincing, like those on text analysis (Chapters 13, 14, 15), where there is little backup for using Benford’s law on such short datasets.  Big data is understood only under the focus of large p, small n, which is small data in my opinion! (Not to mention a minor crime de lèse-majesté in calling Pierre-Simon Laplace Simon-Pierre Laplace! I would also have left the Marquis de aside as this title came to him during the Bourbon Restauration, despite him having served Napoléon for his entire reign.) And, as mentioned above, the book contains apologetic mathematics, which never cease to annoy me since apologies are not needed. While the maths formulas are needed.