## conditional sampling

Posted in R, Statistics with tags , , , , on September 5, 2016 by xi'an

An interesting question about stratified sampling came up on X validated last week, namely how to optimise a Monte Carlo estimate based on two subsequent simulations, one, X, from a marginal and one or several Y from the corresponding conditional given X, when the costs of producing those two simulations significantly differ. When looking at the variance of the Monte Carlo estimate, this variance can be minimised in the number of simulations from the marginal under a computing budget. However, when the costs of producing x, y given or (x,y) are about the same, it does not pay to replicate simulations from y given x or x given y, because this increases the randomness of the estimator and induces positive correlation between some terms in the sum. Assuming that the conditional variances are always computable, we could envision an extension to the question where for each new value of a simulated x (or y), a variable number of conditionally simulated y (or x) could be produced. Or even when those conditional variances are unknown but can be replaced with empirical versions.

The illustration comes from a bivariate normal model with correlation, for a rather arbitrary function , but the pattern remains the same, namely that iid simulations of the pair (X,Y invariably leads to a smaller variance of the estimator compared with simulation with a 1:10 (10 Y’s for one X) or 10:1 ratio between x’s and y’s. Depending on the function and the relative variances, the 1:10 or 10:1 schemes may have a similar variability.

zigma=c(9,1,-.9*sqrt(1*9))

geney=function(x,n=1){
return(rnorm(n,mean=zigma[3]*x/zigma[1],sd=sqrt(zigma[2]-
zigma[3]^2/zigma[1])))}
genex=function(y,n=1){
return(rnorm(n,mean=zigma[3]*y/zigma[2],sd=sqrt(zigma[1]-
zigma[3]*zigma[3]/zigma[2])))}
targ=function(x,y){ log(x^2*y^4)+x^2*cos(x^2)/y*sin(y^2)}

T=1e4;N=1e3
vales=matrix(0,N,3)
for (i in 1:N){
xx=rnorm(T,sd=sqrt(zigma[1]))
vales[i,1]=mean(targ(xx,geney(xx,n=T)))
xx=rep(rnorm(T/10,sd=sqrt(zigma[1])),10)
vales[i,2]=mean(targ(xx,geney(xx,n=T)))
yy=rep(rnorm(T/10,sd=sqrt(zigma[2])),10)
vales[i,3]=mean(targ(enex(yy,n=T),yy))}


## MCqMC 2016 [#4]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 21, 2016 by xi'an

In his plenary talk this morning, Arnaud Doucet discussed the application of pseudo-marginal techniques to the latent variable models he has been investigating for many years. And its limiting behaviour towards efficiency, with the idea of introducing correlation in the estimation of the likelihood ratio. Reducing complexity from O(T²) to O(T√T). With the very surprising conclusion that the correlation must go to 1 at a precise rate to get this reduction, since perfect correlation would induce a bias. A massive piece of work, indeed!

The next session of the morning was another instance of conflicting talks and I hoped from one room to the next to listen to Hani Doss’s empirical Bayes estimation with intractable constants (where maybe SAME could be of interest), Youssef Marzouk’s transport maps for MCMC, which sounds like an attractive idea provided the construction of the map remains manageable, and Paul Russel’s adaptive importance sampling that somehow sounded connected with our population Monte Carlo approach. (With the additional step of considering transform maps.)

An interesting item of information I got from the final announcements at MCqMC 2016 just before heading to Monash, Melbourne, is that MCqMC 2018 will take place in the city of Rennes, Brittany, on July 2-6. Not only it is a nice location on its own, but it is most conveniently located in space and time to attend ISBA 2018 in Edinburgh the week after! Just moving from one Celtic city to another Celtic city. Along with other planned satellite workshops, this occurrence should make ISBA 2018 more attractive [if need be!] for participants from oversea.

## MCqMC 2016 [#2]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on August 17, 2016 by xi'an

In her plenary talk this morning, Christine Lemieux discussed connections between quasi-Monte Carlo and copulas, covering a question I have been considering for a while. Namely, when provided with a (multivariate) joint cdf F, is there a generic way to invert a vector of uniforms [or quasi-uniforms] into a simulation from F? For Archimedian copulas (as we always can get back to copulas), there is a resolution by the Marshall-Olkin representation,  but this puts a restriction on the distributions F that can be considered. The session on synthetic likelihoods [as introduced by Simon Wood in 2010] put together by Scott Sisson was completely focussed on using normal approximations for the distribution of the vector of summary statistics, rather than the standard ABC non-parametric approximation. While there is a clear (?) advantage in using a normal pseudo-likelihood, since it stabilises with much less simulations than a non-parametric version, I find it difficult to compare both approaches, as they lead to different posterior distributions. In particular, I wonder at the impact of the dimension of the summary statistics on the approximation, in the sense that it is less and less likely that the joint is normal as this dimension increases. Whether this is damaging for the resulting inference is another issue, possibly handled by a supplementary ABC step that would take the first-step estimate as summary statistic. (As a side remark, I am intrigued at everyone being so concerned with unbiasedness of methods that are approximations with no assessment of the amount of approximation!) The last session of the day was about multimodality and MCMC solutions, with talks by Hyungsuk Tak, Pierre Jacob and Babak Shababa, plus mine. Hunsuk presented the RAM algorithm I discussed earlier under the title of “love-hate” algorithm, which was a kind reference to my post! (I remain puzzled by the ability of the algorithm to jump to another mode, given that the intermediary step aims at a low or even zero probability region with an infinite mass target.) And Pierre talked about using SMC for Wang-Landau algorithms, with a twist to the classical stochastic optimisation schedule that preserves convergence. And a terrific illustration on a distribution inspired from the Golden Gate Bridge that reminded me of my recent crossing! The discussion around my folded Markov chain talk focussed on the extension of the partition to more than two sets, the difficulty being in generating automated projections, with comments about connections with computer graphic tools. (Too bad that the parallel session saw talks by Mark Huber and Rémi Bardenet that I missed! Enjoying a terrific Burmese dinner with Rémi, Pierre and other friends also meant I could not post this entry on time for the customary 00:16. Not that it matters in the least…)

## MCqMC 2016 [#1]

Posted in Statistics, Travel, University life with tags , , , , , , , , on August 16, 2016 by xi'an

This week, I attend the MCqMC 2016 conference in Stanford, which is quite an exciting gathering of researchers involved in various aspects of Monte Carlo methods. As Art Owen put it in his welcoming talk, the whole Carlo family is there! (Not to mention how pleasant the Stanford Campus currently is, after the scorching heat we met the past week in Northern California inlands.) My talk is on folded Markov chains, which is a proposal Randal and I have been working on for quite a while, with Gareth joining us more recently. The basic idea was inspired from a discussion I had about a blog post, so long ago that I cannot even trace it! Namely, when defining an inside set A and an outside set, such that the outside set can be projected onto the inside set, one can fold both the target and the proposal, essentially looking at a collection of values for each step of the Markov chain. In other words, the problem can be reduced to A at essentially no cost and with the benefits of a compact support A and of a possibly uniformly ergodic Markov chain. We are still working on the paper, but the idea is both cool and straightforward, so we decided to talk about it at Nordstat 2016 and now MCqMC 2016.

## inefficiency of data augmentation for large samples

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on May 31, 2016 by xi'an

On Monday, James Johndrow, Aaron Smith, Natesh Pillai, and David Dunson arXived a paper on the diminishing benefits of using data augmentation for large and highly imbalanced categorical data. They reconsider the data augmentation scheme of Tanner and Wong (1987), surprisingly not mentioned, used in the first occurrences of the Gibbs sampler like Albert and Chib’s (1993) or our mixture estimation paper with Jean Diebolt (1990). The central difficulty with data augmentation is that the distribution to be simulated operates on a space that is of order O(n), even when the original distribution covers a single parameter. As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent). The paper provides spectral gaps for the logistic and probit regression completions, which are of order a power of log(n) divided by √n, when all observations are equal to one. In a somewhat related paper with Jim Hobert and Vivek Roy, we studied the spectral gap for mixtures with a small number of observations: I wonder at the existence of a similar result in this setting, when all observations stem from one component of the mixture, when all observations are one. The result in this paper is theoretically appealing, the more because the posteriors associated with such models are highly regular and very close to Gaussian (and hence not that challenging as argued by Chopin and Ridgway). And because the data augmentation algorithm is uniformly ergodic in this setting (as we established with Jean Diebolt  and later explored with Richard Tweedie). As demonstrated in the  experiment produced in the paper, when comparing with HMC and Metropolis-Hastings (same computing times?), which produce much higher effective sample sizes.

## a Bernoulli factory of sorts?

Posted in Books, Kids, Statistics with tags , , , , , on May 10, 2016 by xi'an

A nice question was posted on X validated as to figure out a way to simulate a Bernoulli B(q) variate when using only a Bernoulli B(p) generator. With the additional question of handling the special case q=a/b, a rational probability. This is not exactly a Bernoulli factory problem in that q does not write as f(p), but still a neat challenge. My solution would have been similar to the one posted by William Huber, namely to simulate a sequence of B(p) or B(1-p) towards zooming on q until the simulation of the underlying uniforms U allows us to conclude at the position of U wrt q. For instance, if p>q and X~B(p) is equal to zero, the underlying uniform is more than p, hence more than q, leading to returning zero for the B(q) generation. Else, a second B(p) or B(1-p) generation means breaking the interval (0,p) into two parts, one of which allows for stopping the generation, and so on. The solution posted by William Huber contains an R code that could be easily improved by choosing for each interval between p and (1-p) towards the maximal probability of stopping. I still wonder at the ultimate optimal solution that would minimise the (average or median) number of calls to the Bernoulli(p) generator.

## more e’s [and R’s]

Posted in Kids, pictures, R, Statistics with tags , , , , , , , on February 22, 2016 by xi'an

Alex Thiéry suggested debiasing the biased estimate of e by Rhee and Glynn truncated series method, so I tried the method to see how much of an improvement (if any!) this would bring. I first attempted to naïvely implement the raw formula of Rhee and Glynn

$\hat{\mathfrak{e}} = \sum_{n=1}^N \{\hat{e}_{n+1}-\hat{e}_n\}\big/\mathbb{P}(N\ge n)$

with a (large) Poisson distribution on the stopping rule N, but this took ages. I then realised that the index n did not have to be absolute, i.e. to start at n=1 and proceed snailwise one integer at a time: the formula remains equally valid after a change of time, i.e. n=can start at an arbitrary value and proceeds by steps of arbitrary size, which obviously speeds things up! Continue reading