Archive for the Statistics Category

unbiased estimator of determinant

Posted in Books, Statistics with tags , , , , on July 3, 2020 by xi'an

Just saw this new [one page] posting on arXiv, meaning an unbiased estimate of the determinant can be derived much faster. If less reliably. This trick can be helpful for (pseudo-marginal) MCMC steps when the determinant itself is of limited interest… (The importance version is not truly needed!)

Markov melding

Posted in Books, Statistics, University life with tags , , , on July 2, 2020 by xi'an

“An alternative approach is to model smaller, simpler aspects of the data, such that designing these submodels is easier, then combine the submodels.”

An interesting paper by Andrew Manderson and Robert Goudie I read on arXiv on merging (or melding) several models together. With different data and different parameters. The assumption is one of a common parameter φ shared by all (sub)models. Since the product of the joint distributions across the m submodels involves m replicates of φ, the melded distribution is the product of the conditional distributions given φ, times a common (or pooled) prior on φ. Which leads to a perfectly well-defined joint distribution provided the support of this pooled prior is compatible with all conditionals.

The MCMC aspects of such a target are interesting in that the submodels can easily be exploited to return proposal distributions on their own parameters (plus φ). Although the notion is fraught with danger when considering a flat prior on φ, since the posterior is not necessarily well-defined. Or at the very least unrelated with the actual marginal posterior. This first stage is used to build a particle approximation to the posterior distribution of φ, exploited in the later simulation of the other subsample parameters and updates of φ. Due to the rare availability of the (submodel) marginal prior on φ, it is replaced in the paper by a kernel density estimate. Not a great idea as (a) it is unstable and (b) the joint density is costly, while existing! Which brings the authors to set a goal of estimating a ratio. Of the same marginal density in two different values of φ. (Not our frequent problem of the ratio of different marginals!) They achieve this by targeting another joint, using a weight function both for the simulation and the kernel density estimation… Requiring the calibration of the weight function and the production of a biased estimate of the ratio.

While the paper concentrates very much on computational improvements, including the possible recourse to unbiased MCMC, I also feel it is missing on the Bayesian aspects, since the construction of the multi-level Bayesian model faces many challenges. In a sense this is an alternative to our better together paper, where cuts are used to avoid the duplication of common parameters.

Le Monde puzzle [#1149]

Posted in Books, Kids, pictures, R with tags , , , , , , on July 1, 2020 by xi'an

The weekly puzzle from Le Monde is a leaking variant on an old puzzle:

Three buckets have capacities of 8, 5 and 3 litres, respectively. At the start of the game, the 8 litre bucket is full and both others are empty. Aiming at reaching exactly 4 litres in one bucket, water is transferred between buckets. However, a fraction 1/k is lost with each transfer. If k=9, it is possible to reach 4 litres in three operations? If k=7, is it at all possible to reach 4 litres?

By sheer random search

k=1/5
z=c(8,5,3)
m<-function(s){
  i=sample(1:3,2)
  s[i]=s[i]+ifelse(
   rep((a<-z[i[1]]-s[i[1]])<(b<-s[i[2]])*(1-k),2),
   a*c(1,-1-k),b*c(1-k,-1))
  s}

I found that most fractions allow to reach 4 litres starting with k=2. (And am unsure the missing ones, like 18 or 21 are not due to a lack of luck… In particular, for k=9, the shortest path is

 8.000    0 0
 2.375    5 0
 0.000    5 2.11
 0.000    4 3

one bridge further

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on June 30, 2020 by xi'an

Jackie Wong, Jon Forster (Warwick) and Peter Smith have just published a paper in Statistics & Computing on bridge sampling bias and improvement by splitting.

“… known to be asymptotically unbiased, bridge sampling technique produces biased estimates in practical usage for small to moderate sample sizes (…) the estimator yields positive bias that worsens with increasing distance between the two distributions. The second type of bias arises when the approximation density is determined from the posterior samples using the method of moments, resulting in a systematic underestimation of the normalizing constant.”

Recall that bridge sampling is based on a double trick with two samples x and y from two (unnormalised) densities f and g that are interverted in a ratio

m \sum_{i=1}^n g(x_i)\omega(x_i) \Big/ n \sum_{i=1}^m f(y_i)\omega(y_i)

of unbiased estimators of the inverse normalising constants. Hence biased. The more the less similar these two densities are. Special cases for ω include importance sampling [unbiased] and reciprocal importance sampling. Since the optimal version of the bridge weight ω is the inverse of the mixture of f and g, it makes me wonder at the performance of using both samples top and bottom, since as an aggregated sample, they also come from the mixture, as in Owen & Zhou (2000) multiple importance sampler. However, a quick try with a positive Normal versus an Exponential with rate 2 does not show an improvement in using both samples top and bottom (even when using the perfectly normalised versions)

morc=(sum(f(y)/(nx*dnorm(y)+ny*dexp(y,2)))+
            sum(f(x)/(nx*dnorm(x)+ny*dexp(x,2))))/(
  sum(g(x)/(nx*dnorm(x)+ny*dexp(x,2)))+
         sum(g(y)/(nx*dnorm(y)+ny*dexp(y,2))))

at least in terms of bias… Surprisingly (!) the bias almost vanishes for very different samples sizes either in favour of f or in favour of g. This may be a form of genuine defensive sampling, who knows?! At the very least, this ensures a finite variance for all weights. (The splitting approach introduced in the paper is a natural solution to create independence between the first sample and the second density. This reminded me of our two parallel chains in AMIS.)

non-reversible jump MCMC

Posted in Books, pictures, Statistics with tags , , , , , , , on June 29, 2020 by xi'an

Philippe Gagnon and et Arnaud Doucet have recently arXived a paper on a non-reversible version of reversible jump MCMC, the methodology introduced by Peter Green in 1995 to tackle Bayesian model choice/comparison/exploration. Whom Philippe presented at BayesComp20.

“The objective of this paper is to propose sampling schemes which do not suffer from such a diffusive behaviour by exploiting the lifting idea (…)”

The idea is related to lifting, creating non-reversible behaviour by adding a direction index (a spin) to the exploration of the models, assumed to be totally ordered, as with nested models (mixtures, changepoints, &tc.).  As with earlier versions of lifting, the chain proceeds along one (spin) direction until the proposal is rejected in which case the spin spins. The acceptance probability in the event of a change of model (upwards or downwards) is essentially the same as the reversible one (meaning it includes the dreaded Jacobian!). The original difficulty with reversible jump remains active with non-reversible jump in that the move from one model to the next must produce plausible values. The paper recalls two methods proposed by Christophe Andrieu and his co-authors. One consists in buffering a tempering sequence, but this proves costly.  Pursuing the interesting underlying theme that both reversible and non-reversible versions are noisy approximations of the marginal ratio, the other one consists in marginalising out the parameter to approximate the marginal probability of moving between nearby models. Combined with multiple choice to preserve stationarity and select more likely moves at the same time. Still requiring a multiplication of the number of simulations but parallelisable. The paper contains an exact comparison result that non-reversible jump leads to a smaller asymptotic variance than reversible jump, but it is unclear to me whether or not this accounts for the extra computing time resulting from the multiple paths in the proposed algorithms. (Even though the numerical illustration shows an improvement brought by the non-reversible side for the same computational budget.)