And another one borrowed from the highly popular post on the random variable [almost] always smaller than its mean!
Archive for Monte Carlo methods
Following a recent post on the topic, and comments ‘Og’s readers kindly provided on that post, the picture is not as clear as I wished it was… Indeed, on the one hand, non-parametric measures of correlation based on ranks are, as pointed out by Clara Grazian and others, invariant under monotonic transforms and hence producing a Gaussian pair or a Uniform pair with the intended rank correlation is sufficient to return a correlated sample for any pair of marginal distributions by the (monotonic) inverse cdf transform. On the other hand, if correlation is understood as Pearson linear correlation, (a) it is not always defined and (b) there does not seem to be a generic approach to simulate from an arbitrary triplet (F,G,ρ) [assuming the three entries are compatible]. When Kees pointed out Pascal van Kooten‘s solution by permutation, I thought this was a terrific resolution, but after thinking about it a wee bit more, I am afraid it is only an approximation, i.e., a way to return a bivariate sample with a given empirical correlation. Not the theoretical correlation. Obviously, when the sample is very large, this comes as a good approximation. But when facing a request to simulate a single pair (X,Y), this gets inefficient [and still approximate].
Now, if we aim at exact simulation from a bivariate distribution with the arbitrary triplet (F,G,ρ), why can’t we find a generic method?! I think one fundamental if obvious reason is that the question is just ill-posed. Indeed, there are many ways of defining a joint distribution with marginals F and G and with (linear) correlation ρ. One for each copula. The joint could thus be associated with a Gaussian copula, i.e., (X,Y)=(F⁻¹(Φ(A)),G⁻¹(Φ(B))) when (A,B) is a standardised bivariate normal with the proper correlation ρ’. Or it can be associated with the Archimedian copula
C(u; v) = (u-θ + v-θ − 1)-1/θ,
with θ>0 defined by a (linear) correlation of ρ. Or yet with any other copula… Were the joint distribution perfectly well-defined, it would then mean that ρ’ or θ (or whatever natural parameter is used for that copula) do perfectly parametrise this distribution instead of the correlation coefficient ρ. All that remains then is to simulate directly from the copula, maybe a theme for a future post…
“Our approach in handling the model uncertainty has some resemblance to statistical ‘‘emulators’’ (Kennedy and O’Hagan, 2001), approximative methods used to express the model uncertainty when simulating data under a mechanistic model is computationally intensive. However, emulators are often motivated in the context of Gaussian processes, where the uncertainty in the model space can be reasonably well modeled by a normal distribution.”
Pierre Pudlo pointed out to me the paper AABC: Approximate approximate Bayesian computation for inference in population-genetic models by Buzbas and Rosenberg that just appeared in the first 2015 issue of Theoretical Population Biology. Despite the claim made above, including a confusion on the nature of Gaussian processes, I am rather reserved about the appeal of this AA rated ABC…
“When likelihood functions are computationally intractable, likelihood-based inference is a challenging problem that has received considerable attention in the literature (Robert and Casella, 2004).”
The ABC approach suggested therein is doubly approximate in that simulation from the sampling distribution is replaced with simulation from a substitute cheaper model. After a learning stage using the costly sampling distribution. While there is convergence of the approximation to the genuine ABC posterior under infinite sample and Monte Carlo sample sizes, there is no correction for this approximation and I am puzzled by its construction. It seems (see p.34) that the cheaper model is build by a sort of weighted bootstrap: given a parameter simulated from the prior, weights based on its distance to a reference table are constructed and then used to create a pseudo-sample by weighted sampling from the original pseudo-samples. Rather than using a continuous kernel centred on those original pseudo-samples, as would be the suggestion for a non-parametric regression. Each pseudo-sample is accepted only when a distance between the summary statistics is small enough. This bootstrap flavour is counter-intuitive in that it requires a large enough sample from the true sampling distribution to operate with some confidence… I also wonder at what happens when the data is not iid. (I added the quote above as another source of puzzlement, since the book is about cases when the likelihood is manageable.)
A paper on the comparison of emulation methods for Approximate Bayesian Computation was recently arXived by Jabot et al. The idea is to bypass costly simulations of pseudo-data by running cheaper simulation from a pseudo-model or emulator constructed via a preliminary run of the original and costly model. To borrow from the paper introduction, ABC-Emulation runs as follows:
- design a small number n of parameter values covering the parameter space;
- generate n corresponding realisations from the model and store the corresponding summary statistics;
- build an emulator (model) based on those n values;
- run ABC using the emulator in lieu of the original model.
A first emulator proposed in the paper is to use local regression, as in Beaumont et al. (2002), except that it goes the reverse way: the regression model predicts a summary statistics given the parameter value. The second and last emulator relies on Gaussian processes, as in Richard Wilkinson‘s as well as Ted Meeds’s and Max Welling‘s recent work [also quoted in the paper]. The comparison of the above emulators is based on an ecological community dynamics model. The results are that the stochastic version is superior to the deterministic one, but overall not very useful when implementing the Beaumont et al. (2002) correction. The paper however does not define what deterministic and what stochastic mean…
“We therefore recommend the use of local regressions instead of Gaussian processes.”
While I find the conclusions of the paper somewhat over-optimistic given the range of the experiment and the limitations of the emulator options (like non-parametric conditional density estimation), it seems to me that this is a direction to be pursued as we need to be able to simulate directly a vector of summary statistics instead of the entire data process, even when considering an approximation to the distribution of those summaries.
Here is the third set of slides for my third year statistics course. Nothing out of the ordinary, but the opportunity to link statistics and simulation for students not yet exposed to Monte Carlo methods. (No ABC yet, but who knows?, I may use ABC as an entry to Bayesian statistics, following Don Rubin’s example! Surprising typo on the Project Euclid page for this 1984 paper, by the way…) On Monday, I had the pleasant surprise to see Shravan Vasishth in the audience, as he is visiting Université Denis Diderot (Paris 7) this month.
Here are some entries I spotted in the past days as of potential interest, for which I will have not enough time to comment:
- arXiv:1410.0163: Instrumental Variables: An Econometrician’s Perspective by Guido Imbens
- arXiv:1410.0123: Deep Tempering by Guillaume Desjardins, Heng Luo, Aaron Courville, Yoshua Bengio
- arXiv:1410.0255: Variance reduction for irreversible Langevin samplers and diffusion on graphs by Luc Rey-Bellet, Konstantinos Spiliopoulos
- arXiv:1409.8502: Combining Particle MCMC with Rao-Blackwellized Monte Carlo Data Association for Parameter Estimation in Multiple Target Tracking by Juho Kokkala, Simo Särkkä
- arXiv:1409.8185: Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models by Theodoros Tsiligkaridis, Keith W. Forsythe
- arXiv:1409.7986: Hypothesis testing for Markov chain Monte Carlo by Benjamin M. Gyori, Daniel Paulin
- arXiv:1409.7672: Order-invariant prior specification in Bayesian factor analysis by Dennis Leung, Mathias Drton
- arXiv:1409.7458: Beyond Maximum Likelihood: from Theory to Practice by Jiantao Jiao, Kartik Venkat, Yanjun Han, Tsachy Weissman
- arXiv:1409.7419: Identifying the number of clusters in discrete mixture models by Cláudia Silvestre, Margarida G. M. S. Cardoso, Mário A. T. Figueiredo
- arXiv:1409.7287: Identification of jump Markov linear models using particle filters by Andreas Svensson, Thomas B. Schön, Fredrik Lindsten
- arXiv:1409.7074: Variational Pseudolikelihood for Regularized Ising Inference by Charles K. Fisher