## inverse Gaussian trick [or treat?]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 29, 2020 by xi'an

When preparing my mid-term exam for my undergrad mathematical statistics course, I wanted to use the inverse Gaussian distribution IG(μ,λ) as an example of exponential family and include a random generator question. As shown above by a Fortran computer code from Michael, Schucany and Haas, a simple version can be based on simulating a χ²(1) variate and solving in x the following second degree polynomial equation

$\dfrac{\lambda(x-\mu)^2}{\mu^2 x} = v$

since the left-hand side transform is distributed as a χ²(1) random variable. The smallest root x¹, less than μ, is then chosen with probability μ/(μ+x¹) and the largest one, x²=μ²/x¹ with probability x¹/(μ+x¹). A relatively easy question then, except when one considers asking for the proof of the χ²(1) result, which proved itself to be a harder cookie than expected! The paper usually referred to for the result, Schuster (1968), is quite cryptic on the matter, essentially stating that the above can be expressed as the (bijective) transform of Y=min(X,μ²/X) and that V~χ²(1) follows immediately. I eventually worked out a proof by the “law of the unconscious statistician” [a name I do not find particularly amusing!], but did not include the question in the exam. But I found it fairly interesting that the inverse Gaussian can be generating by “inverting” the above equation, i.e. going from a (squared) Gaussian variate V to the inverse Gaussian variate X. (Even though the name stems from the two cumulant generating functions being inverses of one another.)

## a jump back in time

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , on October 1, 2018 by xi'an

As the Department of Statistics in Warwick is slowly emptying its shelves and offices for the big migration to the new building that is almost completed, books and documents are abandoned in the corridors and the work spaces. On this occasion, I thus happened to spot a vintage edition of the Valencia 3 proceedings. I had missed this meeting and hence the volume for, during the last year of my PhD, I was drafted in the French Navy and as a result prohibited to travel abroad. (Although on reflection I could have safely done it with no one in the military the wiser!) Reading through the papers thirty years later is a weird experience, as I do not remember most of the papers, the exception being the mixture modelling paper by José Bernardo and Javier Giròn which I studied a few years later when writing the mixture estimation and simulation paper with Jean Diebolt. And then again in our much more recent non-informative paper with Clara Grazian.  And Prem Goel’s survey of Bayesian software. That is, 1987 state of the art software. Covering an amazing eighteen list. Including versions by Zellner, Tierney, Schervish, Smith [but no MCMC], Jaynes, Goldstein, Geweke, van Dijk, Bauwens, which apparently did not survive the ages till now. Most were in Fortran but S was also mentioned. And another version of Tierney, Kass and Kadane on Laplace approximations. And the reference paper of Dennis Lindley [who was already retired from UCL at that time!] on the Hardy-Weinberg equilibrium. And another paper by Don Rubin on using SIR (Rubin, 1983) for simulating from posterior distributions with missing data. Ten years before the particle filter paper, and apparently missing the possibility of weights with infinite variance.

There already were some illustrations of Bayesian analysis in action, including one by Jay Kadane reproduced in his book. And several papers by Jim Berger, Tony O’Hagan, Luis Pericchi and others on imprecise Bayesian modelling, which was in tune with the era, the imprecise probability book by Peter Walley about to appear. And a paper by Shaw on numerical integration that mentioned quasi-random methods. Applied to a 12 component Normal mixture.Overall, a much less theoretical content than I would have expected. And nothing about shrinkage estimators, although a fraction of the speakers had worked on this topic most recently.

At a less fundamental level, this was a time when LaTeX was becoming a standard, as shown by a few papers in the volume (and as I was to find when visiting Purdue the year after), even though most were still typed on a typewriter, including a manuscript addition by Dennis Lindley. And Warwick appeared as a Bayesian hotpot!, with at least five papers written by people there permanently or on a long term visit. (In case a local is interested in it, I have kept the volume, to be found in my new office!)

## Sobol’s Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on December 10, 2016 by xi'an

The name of Ilya Sobol is familiar to researchers in quasi-Monte Carlo methods for his Sobol’s sequences. I was thus surprised to find in my office a small book entitled The Monte Carlo Method by this author, which is a translation of his 1968 book in Russian. I have no idea how it reached my office and I went to check with the library of Paris-Dauphine around the corner [of my corridor] whether it had been lost: apparently, the library got rid of it among a collection of old books… Now, having read through this 67 pages book (or booklet as Sobol puts it) makes me somewhat agree with the librarians, in that there is nothing of major relevance in this short introduction. It is quite interesting to go through the book and see the basics of simulation principles and Monte Carlo techniques unfolding, from the inverse cdf principle [established by a rather convoluted proof] to importance sampling, but the amount of information is about equivalent to the Wikipedia entry on the topic. From an historical perspective, it is also captivating to see the efforts to connect physical random generators (such as those based on vacuum tube noise) to shift-register pseudo-random generators created by Sobol in 1958. On a Soviet Strela computer.

While Googling the title of that book could not provide any connection, I found out that a 1994 version had been published under the title of A Primer for the Monte Carlo Method, which is mostly the same as my version, except for a few additional sections on pseudo-random generation, from the congruential method (with a FORTRAN code) to the accept-reject method being then called von Neumann’s instead of Neyman’s, to the notion of constructive dimension of a simulation technique, which amounts to demarginalisation, to quasi-Monte Carlo [for three pages]. A funny side note is that the author notes in the preface that the first translation [now in my office] was published without his permission!

## Extending R

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , , , , , , , , , on July 13, 2016 by xi'an

As I was previously unaware of this book coming up, my surprise and excitement were both extreme when I received it from CRC Press a few weeks ago! John Chambers, one of the fathers of S, precursor of R, had just published a book about extending R. It covers some reflections of the author on programming and the story of R (Parts 2 and 1),  and then focus on object-oriented programming (Part 3) and the interfaces from R to other languages (Part 4). While this is “only” a programming book, and thus not strictly appealing to statisticians, reading one of the original actors’ thoughts on the past, present, and future of R is simply fantastic!!! And John Chambers is definitely not calling to simply start over and build something better, as Ross Ihaka did in this [most read] post a few years ago. (It is also great to see the names of friends appearing at times, like Julie, Luke, and Duncan!)

“I wrote most of the original software for S3 methods, which were useful for their application, in the early 1990s.”

In the (hi)story part, Chambers delves into the details of the evolution of S at Bells Labs, as described in his [first]  “blue book” (which I kept on my shelf until very recently, next to the “white book“!) and of the occurrence of R in the mid-1990s. I find those sections fascinating maybe the more because I am somewhat of a contemporary, having first learned Fortran (and Pascal) in the mid-1980’s, before moving in the early 1990s to C (that I mostly coded as translated Pascal!), S-plus and eventually R, in conjunction with a (forced) migration from Unix to Linux, as my local computer managers abandoned Unix and mainframe in favour of some virtual Windows machines. And as I started running R on laptops with the help of friends more skilled than I (again keeping some of the early R manuals on my shelf until recently). Maybe one of the most surprising things about those reminiscences is that the very first version of R was dated Feb 29, 2000! Not because of Feb 29, 2000 (which, as Chambers points out, is the first use of the third-order correction to the Gregorian calendar, although I would have thought 1600 was the first one), but because I would have thought it appeared earlier, in conjunction with my first Linux laptop, but this memory is alas getting too vague!

As indicated above, the book is mostly about programming, which means in my case that some sections are definitely beyond my reach! For instance, reading “the onus is on the person writing the calling function to avoid using a reference object as the argument to an existing function that expects a named list” is not immediately clear… Nonetheless, most sections are readable [at my level] and enlightening about the mottoes “everything that exists is an object” and “everything that happens is a function” repeated throughout.  (And about my psycho-rigid ways of translating Pascal into every other language!) I obviously learned about new commands and notions, like the difference between

x <- 3

and

x <<- 3

(but I was disappointed to learn that the number of <‘s was not related with the depth or height of the allocation!) In particular, I found the part about replacement fascinating, explaining how a command like

diag(x)[i] = 3

could modify x directly. (While definitely worth reading, the chapter on R packages could have benefited from more details. But as Chambers points out there are whole books about this.) Overall, I am afraid the book will not improve my (limited) way of programming in R but I definitely recommend it to anyone even moderately skilled in the language.

## can we trust computer simulations?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on July 10, 2015 by xi'an

How can one validate the outcome of a validation model? Or can we even imagine validation of this outcome? This was the starting question for the conference I attended in Hannover. Which obviously engaged me to the utmost. Relating to some past experiences like advising a student working on accelerated tests for fighter electronics. And failing to agree with him on validating a model to turn those accelerated tests within a realistic setting. Or reviewing this book on climate simulation three years ago while visiting Monash University. Since I discuss in details below most talks of the day, here is an opportunity to opt away! Continue reading