## inverse Gaussian trick [or treat?]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 29, 2020 by xi'an When preparing my mid-term exam for my undergrad mathematical statistics course, I wanted to use the inverse Gaussian distribution IG(μ,λ) as an example of exponential family and include a random generator question. As shown above by a Fortran computer code from Michael, Schucany and Haas, a simple version can be based on simulating a χ²(1) variate and solving in x the following second degree polynomial equation $\dfrac{\lambda(x-\mu)^2}{\mu^2 x} = v$

since the left-hand side transform is distributed as a χ²(1) random variable. The smallest root x¹, less than μ, is then chosen with probability μ/(μ+x¹) and the largest one, x²=μ²/x¹ with probability x¹/(μ+x¹). A relatively easy question then, except when one considers asking for the proof of the χ²(1) result, which proved itself to be a harder cookie than expected! The paper usually referred to for the result, Schuster (1968), is quite cryptic on the matter, essentially stating that the above can be expressed as the (bijective) transform of Y=min(X,μ²/X) and that V~χ²(1) follows immediately. I eventually worked out a proof by the “law of the unconscious statistician” [a name I do not find particularly amusing!], but did not include the question in the exam. But I found it fairly interesting that the inverse Gaussian can be generating by “inverting” the above equation, i.e. going from a (squared) Gaussian variate V to the inverse Gaussian variate X. (Even though the name stems from the two cumulant generating functions being inverses of one another.)

## artificial EM

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on October 28, 2020 by xi'an

When addressing an X validated question on the use of the EM algorithm when estimating a Normal mean, my first comment was that it was inappropriate since there is no missing data structure to anchor by (right preposition?). However I then reflected upon the infinite number of ways to demarginalise the normal density into a joint density

$$∫ f(x,z;μ)dz = φ(x–μ)$$

from the (slice sampler) call to an indicator function for $$f(x,z;μ)$$ to a joint Normal distribution with an arbitrary correlation. While the joint Normal representation produces a sequence converging to the MLE, the slice representation utterly fails as the indicator functions make any starting value of $$μ$$ a fixed point for EM.

Incidentally, when quoting from Wikipedia on the purpose of the EM algorithm, the following passage

Finding a maximum likelihood solution typically requires taking the derivatives of the likelihood function with respect to all the unknown values, the parameters and the latent variables, and simultaneously solving the resulting equations.

struck me as confusing and possibly wrong since it seems to suggest to seek a maximum in both the parameter and the latent variables. Which does not produce the same value as the observed likelihood maximisation.

## racism, discrimination and statistics – examining the history [at the RSS]

Posted in Books, Statistics, University life with tags , , , , , on October 23, 2020 by xi'an The Royal Statistical Society is holding an on-line round table on “Racism, discrimination and statistics – examining the history” on 30 October, at 4pm UK time. The chair is RSS President Deborah Ashby and the speakers are

• John Aldrich – chair of the RSS History Section
• Angela Saini – science journalist
• Stephen Senn – Fisher Memorial Trust

## parking riddle

Posted in Books, Kids, pictures, R, Statistics, Travel with tags , , , , , , on October 23, 2020 by xi'an

The Riddler of this week had a quick riddle: if one does want to avoid parallel parking a car over a six spot street, either the first spot is available or two consecutive spots are free. What is the probability this happens with 4 other cars already parked (at random)?

While a derivation by combinatorics easily returns 9/15 as the probability to fail to park, a straightforward R code does as well

 l=0
for(t in 1:1e6){
k=sort(sample(0:5,4))
l=l+1*(!!k|3%in%diff(k)|!k%%3)}


since


> round(choose(6,2)*F/1e6)
 10


## marginal likelihood with large amounts of missing data

Posted in Books, pictures, Statistics with tags , , , , , , , , on October 20, 2020 by xi'an

In 2018, Panayiota Touloupou, research fellow at Warwick, and her co-authors published a paper in Bayesian analysis that somehow escaped my radar, despite standing in my first circle of topics of interest! They construct an importance sampling approach to the approximation of the marginal likelihood, the importance function being approximated from a preliminary MCMC run, and consider the special case when the sampling density (i.e., the likelihood) can be represented as the marginal of a joint density. While this demarginalisation perspective is rather usual, the central point they make is that it is more efficient to estimate the sampling density based on the auxiliary or latent variables than to consider the joint posterior distribution of parameter and latent in the importance sampler. This induces a considerable reduction in dimension and hence explains (in part) why the approach should prove more efficient. Even though the approximation itself is costly, at about 5 seconds per marginal likelihood. But a nice feature of the paper is to include the above graph that includes both computing time and variability for different methods (the blue range corresponding to the marginal importance solution, the red range to RJMCMC and the green range to Chib’s estimate). Note that bridge sampling does not appear on the picture but returns a variability that is similar to the proposed methodology.