Archive for Alan Gelfand

Bayesian sampling without tears

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , , , , on May 24, 2022 by xi'an

Following a question on Stack Overflow trying to replicate a figure from the paper written by Alan Gelfand and Adrian Smith (1990) for The American Statistician, Bayesian sampling without tears, which precedes their historical MCMC papers, I looked at the R code produced by the OP and could not spot an issue as to why their simulation did not fit the posterior produced in the paper. Which proposes acceptance-rejection and sampling-importance-resampling as two solutions to approximately simulate from the posterior. The later being illustrated by simulations from the prior being weighted by the likelihood… The illustration is made of 3 observations from the sum of two Binomials with different success probabilities, θ¹ and θ². With a Uniform prior on both.

for (i in 1:N)
  for (k in 1:3){
    llh<-0
    for (j in max(0,n2[k]-y[k]):min(y[k],n1[k]))
      llh<-llh+choose(n1[k],j)*choose(n2[k],y[k]-j)*
         theta[i,1]^j*(1-theta[i,1])^(n1[k]-j)*theta[i,2]^(y[k]-j)*
         (1-theta[i,2])^(n2[k]-y[k]+j)
    l[i]=l[i]*llh}

To double-check, I also wrote a Gibbs version:

theta=matrix(runif(2),nrow=T,ncol=2)
x1=rep(NA,3)
for(t in 1:(T-1)){
   for(j in 1:3){
     a<-max(0,n2[j]-y[j]):min(y[j],n1[j])
     x1[j]=sample(a,1,
        prob=choose(n1[j],a)*choose(n2[j],y[j]-a)*
           theta[t,1]^a*(1-theta[t,1])^(n1[j]-a)*
           theta[t,2]^(y[j]-a)*(1-theta[t,2])^(n2[j]-y[j]+a)
           )}
      theta[t+1,1]=rbeta(1,sum(x1)+1,sum(n1)-sum(x1)+1)
      theta[t+1,2]=rbeta(1,sum(y)-sum(x1)+1,sum(n2)-sum(y)+sum(x1)+1)}

which did not show any difference with the above. Nor with the likelihood surface.

a come-back of the harmonic mean estimator

Posted in Statistics with tags , , , , , , on September 6, 2018 by xi'an

Are we in for a return of the harmonic mean estimator?! Allen Caldwell and co-authors arXived a new document that Allen also sent me, following a technique that offers similarities with our earlier approach with Darren Wraith, the difference being in the more careful and practical construct of the partition set and use of multiple hypercubes, which is the smart thing. I visited Allen’s group at the Max Planck Institut für Physik (Heisenberg) in München (Garching) in 2015 and we confronted our perspectives on harmonic means at that time. The approach followed in the paper starts from what I would call the canonical Gelfand and Dey (1995) representation with a uniform prior, namely that the integral of an arbitrary non-negative function [or unnormalised density] ƒ can be connected with the integral of the said function ƒ over a smaller set Δ with a finite measure measure [or volume]. And therefore to simulations from the density ƒ restricted to this set Δ. Which can be recycled by the harmonic mean identity towards producing an estimate of the integral of ƒ over the set Δ. When considering a partition, these integrals sum up to the integral of interest but this is not necessarily the only exploitation one can make of the fundamental identity. The most novel part stands in constructing an adaptive partition based on the sample, made of hypercubes obtained after whitening of the sample. Only keeping points with large enough density and sufficient separation to avoid overlap. (I am unsure a genuine partition is needed.) In order to avoid selection biases the original sample is separated into two groups, used independently. Integrals that stand too much away from the others are removed as well. This construction may sound a bit daunting in the number of steps it involves and in the poor adequation of a Normal to an hypercube or conversely, but it seems to shy away from the number one issue with the basic harmonic mean estimator, the almost certain infinite variance. Although it would be nice to be completely certain this doom is avoided. I still wonder at the degenerateness of the approximation of the integral with the dimension, as well as at other ways of exploiting this always fascinating [if fraught with dangers] representation. And comparing variances.

Alan Gelfand in Paris

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on May 11, 2017 by xi'an

Alan Gelfand (Duke University) will be in Paris on the week of May 15 and give several seminars, including one at AgroParisTech on May 16:

Modèles hiérarchiques

and on at CREST (BiPS)  on May 18, 2pm:

Scalable Gaussian processes for analyzing space and space-time datasets

recycling Gibbs auxiliaries

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on December 6, 2016 by xi'an

wreck of the S.S. Dicky, Caloundra beach, Qld, Australia, Aug. 19, 2012Luca Martino, Victor Elvira and Gustau Camps-Valls have arXived a paper on recycling for Gibbs sampling. The argument therein is to take advantage of all simulations induced by MCMC simulation for one full conditional, towards improving estimation if not convergence. The context is thus one when Metropolis-within-Gibbs operates, with several (M) iterations of the corresponding Metropolis being run instead of only one (which is still valid from a theoretical perspective). While there are arguments in augmenting those iterations, as recalled in the paper, I am not a big fan of running a fixed number of M of iterations as this does not approximate better the simulation from the exact full conditional and even if this approximation was perfect, the goal remains simulating from the joint distribution. As such, multiplying the number of Metropolis iterations does not necessarily impact the convergence rate, only brings it closer to the standard Gibbs rate. Moreover, the improvement does varies with the chosen component, meaning that the different full conditionals have different characteristics that produce various levels of variance reduction:

  • if the targeted expectation only depends on one component of the Markov chain, multiplying the number of simulations for the other components has no clear impact, except in increasing time;
  • if the corresponding full conditional is very concentrated, repeating simulations should produce quasi-repetitions, and no gain.

The only advantage in computing time that I can see at this stage is when constructing the MCMC sampler for the full proposal is much more costly than repeating MCMC iterations, which are then almost free and contribute to the reduction of the variance of the estimator.

This analysis of MCMC-withing-Gibbs strategies reminds me of a recent X validated question, which was about the proper degree of splitting simulations from a marginal and from a corresponding conditional in the chain rule, the optimal balance being in my opinion dependent on the relative variances of the conditional expectations.

A last point is that recycling in the context of simulation and Monte Carlo methodology makes me immediately think of Rao-Blackwellisation, which is surprisingly absent from the current paperRao-Blackwellisation was introduced in the MCMC literature and to the MCMC community in the first papers of Alan Gelfand and Adrian Smith, in 1990. While this is not always producing a major gain in Monte Carlo variability, it remains a generic way of recycling auxiliary variables as shown, e.g., in the recycling paper we wrote with George Casella in 1996, one of my favourite papers.

%d bloggers like this: