Archive for The American Statistician

Monte Carlo swindles

Posted in Statistics with tags , , , , , , , , , on April 2, 2023 by xi'an

While reading Boos and Hugues-Olivier’s 1998 American Statistician paper on the applications of Basu’s theorem I can across the notion of Monte Carlo swindles. Where a reduced variance can be achieved without the corresponding increase in Monte Carlo budget. For instance, approximating the variance of the median statistic Μ for a Normal location family can be sped up by considering that

\text{var}(M)=\text{var}(M-\bar X)+\text{var}(\bar X)

by Basu’s theorem. However, when reading the originating 1973 paper by Gross (although the notion is presumably due to Tukey), the argument boils down to Rao-Blackwellisation (without the Rao-Blackwell theorem being mentioned). The related 1985 American Statistician paper by Johnstone and Velleman exploits a latent variable representation. It also makes the connection with the control variate approach, noticing the appeal of using the score function as a (standard) control and (unusual) swindle, since its expectation is zero. I am surprised at uncovering this notion only now… Possibly because the method only applies in special settings.

A side remark from the same 1998 paper, namely that the enticing decomposition

\mathbb E[(X/Y)^k] = \mathbb E[X^k] \big/ \mathbb E[Y^k]

when X/Y and Y are independent, should be kept out of reach from my undergraduates at all costs, as they would quickly get rid of the assumption!!!

Bayesian sampling without tears

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , , , , on May 24, 2022 by xi'an

Following a question on Stack Overflow trying to replicate a figure from the paper written by Alan Gelfand and Adrian Smith (1990) for The American Statistician, Bayesian sampling without tears, which precedes their historical MCMC papers, I looked at the R code produced by the OP and could not spot an issue as to why their simulation did not fit the posterior produced in the paper. Which proposes acceptance-rejection and sampling-importance-resampling as two solutions to approximately simulate from the posterior. The later being illustrated by simulations from the prior being weighted by the likelihood… The illustration is made of 3 observations from the sum of two Binomials with different success probabilities, θ¹ and θ². With a Uniform prior on both.

for (i in 1:N)
  for (k in 1:3){
    llh<-0
    for (j in max(0,n2[k]-y[k]):min(y[k],n1[k]))
      llh<-llh+choose(n1[k],j)*choose(n2[k],y[k]-j)*
         theta[i,1]^j*(1-theta[i,1])^(n1[k]-j)*theta[i,2]^(y[k]-j)*
         (1-theta[i,2])^(n2[k]-y[k]+j)
    l[i]=l[i]*llh}

To double-check, I also wrote a Gibbs version:

theta=matrix(runif(2),nrow=T,ncol=2)
x1=rep(NA,3)
for(t in 1:(T-1)){
   for(j in 1:3){
     a<-max(0,n2[j]-y[j]):min(y[j],n1[j])
     x1[j]=sample(a,1,
        prob=choose(n1[j],a)*choose(n2[j],y[j]-a)*
           theta[t,1]^a*(1-theta[t,1])^(n1[j]-a)*
           theta[t,2]^(y[j]-a)*(1-theta[t,2])^(n2[j]-y[j]+a)
           )}
      theta[t+1,1]=rbeta(1,sum(x1)+1,sum(n1)-sum(x1)+1)
      theta[t+1,2]=rbeta(1,sum(y)-sum(x1)+1,sum(n2)-sum(y)+sum(x1)+1)}

which did not show any difference with the above. Nor with the likelihood surface.

ISBA 2021.3

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 1, 2021 by xi'an

Now on the third day which again started early with a 100% local j-ISBA session. (After a group run to and around Mont Puget, my first real run since 2020!!!) With a second round of talks by junior researchers from master to postdoc level. Again well-attended. A talk about Bayesian non-parametric sequential taxinomy by Alessandro Zito used the BayesANT acronym, which reminded me of the new vave group Adam and the Ants I was listening to forty years ago, in case they need a song as well as a logo! (Note that BayesANT is also used for a robot using Bayesian optimisation!) And more generally a wide variety in the themes. Thanks to the j-organisers of this 100% live session!

The next session was on PDMPs, which I helped organise, with Manon Michel speaking from Marseille, exploiting the symmetry around the gradient, which is distribution-free! Then, remotely, Kengo Kamatani, speaking from Tokyo, who expanded the high-dimensional scaling limit to the Zig-Zag sampler, exhibiting an argument against small refreshment rates, and Murray Pollock, from Newcastle, who exposed quite clearly the working principles of the Restore algorithm, including why coupling from the past was available in this setting. A well-attended session despite the early hour (in the USA).

Another session of interest for me [which I attended by myself as everyone else was at lunch in CIRM!] was the contributed C16 on variational and scalable inference that included a talk on hierarchical Monte Carlo fusion (with my friends Gareth and Murray as co-authors), Darren’s call to adopt functional programming in order to save Bayesian computing from extinction, normalising flows for modularisation, and Dennis’ adversarial solutions for Bayesian design, avoiding the computation of the evidence.

Wes Johnson’s lecture was about stories with setting prior distributions based on experts’ opinions. Which reminded me of the short paper Kaniav Kamary and myself wrote about ten years ago, in response to a paper on the topic in the American Statistician. And could not understand the discrepancy between two Bayes factors based on Normal versus Cauchy priors, until I was told they were mistakenly used repeatedly.

Rushing out of dinner, I attended both the non-parametric session (live with Marta and Antonio!) and the high-dimension computational session on Bayesian model choice (mute!). A bit of a schizophrenic moment, but allowing to get a rough picture in both areas. At once. Including an adaptive MCMC scheme for selecting models by Jim Griffin. Which could be run directly over the model space. With my ever-going wondering at the meaning of neighbour models.

double if not exponential

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 10, 2020 by xi'an

In one of my last quizzes for the year, as the course is about to finish, I asked whether mean or median was the MLE for a double exponential sample of odd size, without checking for the derivation of the result, as I was under the impression it was a straightforward result. Despite being outside exponential families. As my students found it impossible to solve within the allocated 5 minutes, I had a look, could not find an immediate argument (!), and used instead this nice American Statistician note by Robert Norton based on the derivative being the number of observations smaller than θ minus the number of observations larger than θ.  This leads to the result as well as the useful counter-example of a range of MLE solutions when the number of observations is even.

inverse Gaussian trick [or treat?]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 29, 2020 by xi'an

When preparing my mid-term exam for my undergrad mathematical statistics course, I wanted to use the inverse Gaussian distribution IG(μ,λ) as an example of exponential family and include a random generator question. As shown above by a Fortran computer code from Michael, Schucany and Haas, a simple version can be based on simulating a χ²(1) variate and solving in x the following second degree polynomial equation

\dfrac{\lambda(x-\mu)^2}{\mu^2 x} = v

since the left-hand side transform is distributed as a χ²(1) random variable. The smallest root x¹, less than μ, is then chosen with probability μ/(μ+x¹) and the largest one, x²=μ²/x¹ with probability x¹/(μ+x¹). A relatively easy question then, except when one considers asking for the proof of the χ²(1) result, which proved itself to be a harder cookie than expected! The paper usually referred to for the result, Schuster (1968), is quite cryptic on the matter, essentially stating that the above can be expressed as the (bijective) transform of Y=min(X,μ²/X) and that V~χ²(1) follows immediately. I eventually worked out a proof by the “law of the unconscious statistician” [a name I do not find particularly amusing!], but did not include the question in the exam. But I found it fairly interesting that the inverse Gaussian can be generating by “inverting” the above equation, i.e. going from a (squared) Gaussian variate V to the inverse Gaussian variate X. (Even though the name stems from the two cumulant generating functions being inverses of one another.)

%d bloggers like this: