Archive for Book

Is that a big number? [book review]

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , on July 31, 2018 by xi'an

A book I received prior to its publication a few days ago from OXford University Press (OUP), as a book editor for CHANCE (usual provisions apply: the contents of this post will be more or less reproduced in my column in CHANCE when it appears). Copy that I found in my mailbox in Warwick last week and read over the (very hot) weekend.

The overall aim of this book by Andrew Elliott is to encourage numeracy (or fight innumeracy) by making sense of absolute quantities by putting them in perspective, teaching about log scales, visualisation, and divide-and-conquer techniques. And providing a massive list of examples and comparisons, sometimes for page after page… The book is associated with a fairly rich website, itself linked with the many blogs of the author and a myriad of other links and items of information (among which I learned of the recent and absurd launch of Elon Musk’s Tesla car in space! A première in garbage dumping…). From what I can gather from these sites, some (most?) of the material in the book seems to have emerged from the various blog entries.

“Length of River Thames (386 km) is 2 x length of the Suez Canal (193.3 km)”

Maybe I was too exhausted by heat and a very busy week in Warwick for our computational statistics week, the football  2018 World Cup having nothing to do with this, but I could not keep reading the chapters of the book in a continuous manner, suffering from massive information overdump! Being given thousands of entries kills [for me] the appeal of outing weight or sense to large and very large and humongous quantities. And the final vignette in each chapter of pairing of numbers like the one above or the one below

“Time since earliest writing (5200 y) is 25 x time since birth of Darwin (208 y)”

only evokes the remote memory of some kid journal I read from time to time as a kid with this type of entries (I cannot remember the name of the journal!). Or maybe it was a journal I would browse while waiting at the hairdresser’s (which brings back memories of endless waits, maybe because I did not like going to the hairdresser…) Some of the background about measurement and other curios carry a sense of Wikipediesque absolute in their minute details.

A last point of disappointment about the book is the poor graphical design or support. While the author insists on the importance of visualisation on grasping the scales of large quantities, and the webpage is full of such entries, there is very little backup with great graphs to be found in “Is that a big number?” Some of the pictures seem taken from an anonymous databank (where are the towers of San Geminiano?!) and there are not enough graphics. For instance, the fantastic graphics of xkcd conveying the xkcd money chart poster. Or about future. Or many many others

While the style is sometimes light and funny, an overall impression of dryness remains and in comparison I much more preferred Kaiser Fung’s Numbers rule your world and even more both Guesstimation books!

resampling methods

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on December 6, 2017 by xi'an

A paper that was arXived [and that I missed!] last summer is a work on resampling by Mathieu Gerber, Nicolas Chopin (CREST), and Nick Whiteley. Resampling is used to sample from a weighted empirical distribution and to correct for very small weights in a weighted sample that otherwise lead to degeneracy in sequential Monte Carlo (SMC). Since this step is based on random draws, it induces noise (while improving the estimation of the target), reducing this noise is preferable, hence the appeal of replacing plain multinomial sampling with more advanced schemes. The initial motivation is for sequential Monte Carlo where resampling is rife and seemingly compulsory, but this also applies to importance sampling when considering several schemes at once. I remember discussing alternative schemes with Nicolas, then completing his PhD, as well as Olivier Cappé, Randal Douc, and Eric Moulines at the time (circa 2004) we were working on the Hidden Markov book. And getting then a somewhat vague idea as to why systematic resampling failed to converge.

In this paper, Mathieu, Nicolas and Nick show that stratified sampling (where a uniform is generated on every interval of length 1/n) enjoys some form of consistent, while systematic sampling (where the “same” uniform is generated on every interval of length 1/n) does not necessarily enjoy this consistency. There actually exists cases where convergence does not occur. However, a residual version of systematic sampling (where systematic sampling is applied to the residuals of the decimal parts of the n-enlarged weights) is itself consistent.

The paper also studies the surprising feature uncovered by Kitagawa (1996) that stratified sampling applied to an ordered sample brings an error of O(1/n²) between the cdf rather than the usual O(1/n). It took me a while to even understand the distinction between the original and the ordered version (maybe because Nicolas used the empirical cdf during his SAD (Stochastic Algorithm Day!) talk, ecdf that is the same for ordered and initial samples).  And both systematic and deterministic sampling become consistent in this case. The result was shown in dimension one by Kitagawa (1996) but extends to larger dimensions via the magical trick of the Hilbert curve.

mea culpa!

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on October 9, 2017 by xi'an

An entry about our Bayesian Essentials book on X validated alerted me to a typo in the derivation of the Gaussian posterior..! When deriving the posterior (which was left as an exercise in the Bayesian Core), I just forgot the term expressing the divergence between the prior mean and the sample mean. Mea culpa!!!

LaTeX issues from Vienna

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on September 21, 2017 by xi'an

When working on the final stage of our edited handbook on mixtures, in Vienna, I came across unexpected practical difficulties! One was that by working on Dropbox with Windows users, files and directories names suddenly switched from upper case to lower cases letters !, making hard-wired paths to figures and subsections void in the numerous LaTeX files used for the book. And forcing us to change to lower cases everywhere. Having not worked under Windows since George Casella gave me my first laptop in the mid 90’s!, I am amazed that this inability to handle both upper and lower names is still an issue. And that Dropbox replicates it. (And that some people see that as a plus.)

The other LaTeX issue that took a while to solve was that we opted for one chapter one bibliography, rather than having a single bibliography at the end of the book, mainly because CRC Press asked for this feature in order to sell chapters individually… This was my first encounter with this issue and I found the solutions to produce individual bibliographies incredibly heavy handed, whether through chapterbib or bibunits, since one has to bibtex one .aux file for each chapter. Even with a one line bash command,

for f in bu*aux; do bibtex `basename $f .aux`; done

this is annoying in the extreme!

zurück nach Wien

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , on September 16, 2017 by xi'an

Today, I am travelling to Vienna for a few days, primarily for assessing a grant renewal for a research consortium federating most Austrian research groups on a topic for which Austria is a world-leader. (Sorry for being cryptic but I am unsure how much I can disclose about this assessment!) And taking advantage on being in Vienna, for a two-day editing session with Sylvia Früwirth-Schnatter and Gilles Celeux on our Handbook of mixtures analysis project. Which started a few years ago with another meeting in Vienna. And taking further advantage on being in Vienna, for an evening at the Volksoper, conveniently playing Die Zauberflöte!

an elegant result on exponential spacings

Posted in Statistics with tags , , , , , , , , , , , , , on April 19, 2017 by xi'an

A question on X validated I spotted in the train back from Lyon got me desperately seeking a reference in Devroye’s Generation Bible despite the abyssal wireless and a group of screeching urchins a few seats away from me… The question is about why

\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)

when the Y’s are standard exponentials. Since this reminded me immediately of exponential spacings, thanks to our Devroye fan-club reading group in Warwick,  I tried to download Devroye’s Chapter V and managed after a few aborts (and a significant increase in decibels from the family corner). The result by Sukhatme (1937) is in plain sight as Theorem 2.3 and is quite elegant as it relies on the fact that

\sum_{i=1}^n y_i=\sum_{j=1}^n (n-j+1)(y_{(j)}-y_{(j-1)})=\sum_{j=2}^n (y_{(j)}-y_{(1)})

hence sums up as a mere linear change of variables! (Pandurang Vasudeo Sukhatme (1911–1997) was an Indian statistician who worked on human nutrition and got the Guy Medal of the RSS in 1963.)

simulation by hand

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on November 28, 2016 by xi'an

A rather weird question on X validated this week was about devising a manual way to simulate (a few) normal variates. By manual I presume the author of the question means without resorting to a computer or any other business machine. Now, I do not know of any real phenomenon that is exactly and provably Normal. As analysed in a great philosophy of science paper by Aidan Lyon, the standard explanations for a real phenomenon to be Normal are almost invariably false, even those invoking the Central Limit Theorem. Hence I cannot think of a mechanical device that would directly return Normal generations from a Normal distribution with known parameters. However, since it is possible to simulate by hand Uniform U(0,1) variates [up to a given precision] using a chronometre or a wheel, calls to versions of the Box-Müller algorithm that do not rely on logarithmic or trigonometric functions are feasible, for instance by generating two Exponential variates, x and y, until 2y>(1-x)², x being the output. And generating Exponential variates is easy provided a radioactive material with known half-life is available, along with a Geiger counter. Or, if not, by calling von Neumann’s exponential generator. As detailed in Devroye’s simulation book.

After proposing this solution, I received a comment from the author of the question towards a simpler solution based, e.g., on the Central Limit Theorem. Presumably for simple iid random variables such as coin tosses or dice experiments. While I used the CLT for simulating Normal variables in my very early days [just after programming on punched cards!], I do not think this is a very good or efficient method, as the tails grow very slowly to normality. By comparison, using the same amount of coin tosses to create a sufficient number of binary digits of a Uniform variate produces a computer-precision exact Uniform variate, which can be exploited in Box-Müller-like algorithms to return exact Normal variates… Even by hand if necessary. [For some reason, this question attracted a lot of traffic and an encyclopaedic answer on X validated, despite being borderline to the point of being proposed for closure.]