## simulating correlated random variables [cont’ed]

Posted in Books, Kids, Statistics with tags , , , , on May 28, 2015 by xi'an

Following a recent post on the topic, and comments ‘Og’s readers kindly provided on that post, the picture is not as clear as I wished it was… Indeed, on the one hand, non-parametric measures of correlation based on ranks are, as pointed out by Clara Grazian and others, invariant under monotonic transforms and hence producing a Gaussian pair or a Uniform pair with the intended rank correlation is sufficient to return a correlated sample for any pair of marginal distributions by the (monotonic) inverse cdf transform.  On the other hand, if correlation is understood as Pearson linear correlation, (a) it is not always defined and (b) there does not seem to be a generic approach to simulate from an arbitrary triplet (F,G,ρ) [assuming the three entries are compatible]. When Kees pointed out Pascal van Kooten‘s solution by permutation, I thought this was a terrific resolution, but after thinking about it a wee bit more, I am afraid it is only an approximation, i.e., a way to return a bivariate sample with a given empirical correlation. Not the theoretical correlation. Obviously, when the sample is very large, this comes as a good approximation. But when facing a request to simulate a single pair (X,Y), this gets inefficient [and still approximate].

Now, if we aim at exact simulation from a bivariate distribution with the arbitrary triplet (F,G,ρ), why can’t we find a generic method?! I think one fundamental if obvious reason is that the question is just ill-posed. Indeed, there are many ways of defining a joint distribution with marginals F and G and with (linear) correlation ρ. One for each copula. The joint could thus be associated with a Gaussian copula, i.e., (X,Y)=(F⁻¹(Φ(A)),G⁻¹(Φ(B))) when (A,B) is a standardised bivariate normal with the proper correlation ρ’. Or it can be associated with the Archimedian copula

C(u; v) = (u + v − 1)-1/θ,

with θ>0 defined by a (linear) correlation of ρ. Or yet with any other copula… Were the joint distribution perfectly well-defined, it would then mean that ρ’ or θ (or whatever natural parameter is used for that copula) do perfectly parametrise this distribution instead of the correlation coefficient ρ. All that remains then is to simulate directly from the copula, maybe a theme for a future post…

## approximate approximate Bayesian computation [not a typo!]

Posted in Books, Statistics, University life with tags , , , , , on January 12, 2015 by xi'an

“Our approach in handling the model uncertainty has some resemblance to statistical ‘‘emulators’’ (Kennedy and O’Hagan, 2001), approximative methods used to express the model uncertainty when simulating data under a mechanistic model is computationally intensive. However, emulators are often motivated in the context of Gaussian processes, where the uncertainty in the model space can be reasonably well modeled by a normal distribution.”

Pierre Pudlo pointed out to me the paper AABC: Approximate approximate Bayesian computation for inference in population-genetic models by Buzbas and Rosenberg that just appeared in the first 2015 issue of Theoretical Population Biology. Despite the claim made above, including a confusion on the nature of Gaussian processes, I am rather reserved about the appeal of this AA rated ABC…

“When likelihood functions are computationally intractable, likelihood-based inference is a challenging problem that has received considerable attention in the literature (Robert and Casella, 2004).”

The ABC approach suggested therein is doubly approximate in that simulation from the sampling distribution is replaced with simulation from a substitute cheaper model. After a learning stage using the costly sampling distribution. While there is convergence of the approximation to the genuine ABC posterior under infinite sample and Monte Carlo sample sizes, there is no correction for this approximation and I am puzzled by its construction. It seems (see p.34) that the cheaper model is build by a sort of weighted bootstrap: given a parameter simulated from the prior, weights based on its distance to a reference table are constructed and then used to create a pseudo-sample by weighted sampling from the original pseudo-samples. Rather than using a continuous kernel centred on those original pseudo-samples, as would be the suggestion for a non-parametric regression. Each pseudo-sample is accepted only when a distance between the summary statistics is small enough. This bootstrap flavour is counter-intuitive in that it requires a large enough sample from the true  sampling distribution to operate with some confidence… I also wonder at what happens when the data is not iid.  (I added the quote above as another source of puzzlement, since the book is about cases when the likelihood is manageable.)

## ABC with emulators

Posted in Books, Statistics with tags , , , , , , , on January 9, 2015 by xi'an

A paper on the comparison of emulation methods for Approximate Bayesian Computation was recently arXived by Jabot et al. The idea is to bypass costly simulations of pseudo-data by running cheaper simulation from a pseudo-model or emulator constructed via a preliminary run of the original and costly model. To borrow from the paper introduction, ABC-Emulation runs as follows:

1. design a small number n of parameter values covering the parameter space;
2. generate n corresponding realisations from the model and store the corresponding summary statistics;
3. build an emulator (model) based on those n values;
4. run ABC using the emulator in lieu of the original model.

A first emulator proposed in the paper is to use local regression, as in Beaumont et al. (2002), except that it goes the reverse way: the regression model predicts a summary statistics given the parameter value. The second and last emulator relies on Gaussian processes, as in Richard Wilkinson‘s as well as Ted Meeds’s and Max Welling‘s recent work [also quoted in the paper]. The comparison of the above emulators is based on an ecological community dynamics model. The results are that the stochastic version is superior to the deterministic one, but overall not very useful when implementing the Beaumont et al. (2002) correction. The paper however does not define what deterministic and what stochastic mean…

“We therefore recommend the use of local regressions instead of Gaussian processes.”

While I find the conclusions of the paper somewhat over-optimistic given the range of the experiment and the limitations of the emulator options (like non-parametric conditional density estimation), it seems to me that this is a direction to be pursued as we need to be able to simulate directly a vector of summary statistics instead of the entire data process, even when considering an approximation to the distribution of those summaries.

## Statistics slides (3)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on October 9, 2014 by xi'an

Here is the third set of slides for my third year statistics course. Nothing out of the ordinary, but the opportunity to link statistics and simulation for students not yet exposed to Monte Carlo methods. (No ABC yet, but who knows?, I may use ABC as an entry to Bayesian statistics, following Don Rubin’s example! Surprising typo on the Project Euclid page for this 1984 paper, by the way…) On Monday, I had the pleasant surprise to see Shravan Vasishth in the audience, as he is visiting Université Denis Diderot (Paris 7) this month.

## Monte Carlo simulation and resampling methods for social science [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on October 6, 2014 by xi'an

Monte Carlo simulation and resampling methods for social science is a short paperback written by Thomas Carsey and Jeffrey Harden on the use of Monte Carlo simulation to evaluate the adequacy of a model and the impact of assumptions behind this model. I picked it in the library the other day and browse through the chapters during one of my métro rides. Definitely not an in-depth reading, so be warned!

Overall, I think the book is doing a good job of advocating the use of simulation to evaluate the pros and cons of a given model (rephrased as data generating process) when faced with data. And doing it in R. After some rudiments in probability theory and in R programming, it briefly explains the use of resident random generators if not of how to handle new distributions and then spend a large part of the book on simulation around generalised and regular linear models. For instance, in the linear model, the authors test the impact of heterocedasticity, multicollinearity, measurement error, omitted variable(s), serial correlation, clustered data, and heavy-tailed errors. While this is a perfect way of exploring those semi-hidden hypotheses behind the linear model, I wonder at the impact on students of this exploration. On the one hand, they will perceive the importance of those assumptions and hopefully remember them. On the other hand, and this is a very recurrent criticism of mine, this implies a lot of maturity from the students, i.e., they have to distinguish the data, the model [maybe] behind the data, the finite if large number of hypotheses one can test, and the interpretation of the outcome of a simulation test… Given that they were introduced to basic probability just a few chapters before, this expectation [from the students] may prove unrealistic. (And a similar criticism applies to the following chapters, from GLM to jackknife and bootstrap.)

At the end of the book, the authors ask the question as to how could a reader use the information in this book towards one’s work. Drafting a generic protocol for this reader, who is supposed to consider “alterations to the data generating process” (p.272) and to “identify a possible problem or assumption violation” (p.271). Thus requiring a readership “who has some training in quantitative methods” (p.1). And then some more. But I definitely sympathise with the goal of confronting models and theory with the harsh reality of simulation output!

## plenty of new arXivals!

Posted in Statistics, University life with tags , , , , , on October 2, 2014 by xi'an

Here are some entries I spotted in the past days as of potential interest, for which I will have not enough time to comment:

• arXiv:1410.0163: Instrumental Variables: An Econometrician’s Perspective by Guido Imbens
• arXiv:1410.0123: Deep Tempering by Guillaume Desjardins, Heng Luo, Aaron Courville, Yoshua Bengio
• arXiv:1410.0255: Variance reduction for irreversible Langevin samplers and diffusion on graphs by Luc Rey-Bellet, Konstantinos Spiliopoulos
• arXiv:1409.8502: Combining Particle MCMC with Rao-Blackwellized Monte Carlo Data Association for Parameter Estimation in Multiple Target Tracking by Juho Kokkala, Simo Särkkä
• arXiv:1409.8185: Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models by Theodoros Tsiligkaridis, Keith W. Forsythe
• arXiv:1409.7986: Hypothesis testing for Markov chain Monte Carlo by Benjamin M. Gyori, Daniel Paulin
• arXiv:1409.7672: Order-invariant prior specification in Bayesian factor analysis by Dennis Leung, Mathias Drton
• arXiv:1409.7458: Beyond Maximum Likelihood: from Theory to Practice by Jiantao Jiao, Kartik Venkat, Yanjun Han, Tsachy Weissman
• arXiv:1409.7419: Identifying the number of clusters in discrete mixture models by Cláudia Silvestre, Margarida G. M. S. Cardoso, Mário A. T. Figueiredo
• arXiv:1409.7287: Identification of jump Markov linear models using particle filters by Andreas Svensson, Thomas B. Schön, Fredrik Lindsten
• arXiv:1409.7074: Variational Pseudolikelihood for Regularized Ising Inference by Charles K. Fisher

## vector quantile regression

Posted in pictures, Statistics, University life with tags , , , , , , , on July 4, 2014 by xi'an

My Paris-Dauphine colleague Guillaume Carlier recently arXived a statistics paper entitled Vector quantile regression, co-written with Chernozhukov and Galichon. I was most curious to read the paper as Guillaume is primarily a mathematical analyst working on optimisation problems like optimal transport. And also because I find quantile regression difficult to fathom as a statistical problem. (As it happens, both his co-authors are from econometrics.) The results in the paper are (i) to show that a d-dimensional (Lebesgue) absolutely continuous random variable Y can always be represented as the deterministic transform Y=Q(U), where U is a d-dimensional [0,1] uniform (the paper expresses this transform as conditional on a set of regressors Z, but those essentially play no role) and Q is monotonous in the sense of being the gradient of a convex function,

$Q(u) = \nabla q(u)$ and $\{Q(u)-Q(v)\}^\text{T}(u-v)\ge 0;$

(ii) to deduce from this representation a unique notion of multivariate quantile function; and (iii) to consider the special case when the quantile function Q can be written as the linear

$\beta(U)^\text{T}Z$

where β(U) is a matrix. Hence leading to an estimation problem.

While unsurprising from a measure theoretic viewpoint, the representation theorem (i) is most interesting both for statistical and simulation reasons. Provided the function Q can be easily estimated and derived, respectively. The paper however does not provide a constructive tool for this derivation, besides indicating several characterisations as solutions of optimisation problems. From a statistical perspective, a non-parametric estimation of  β(.) would have useful implications in multivariate regression, although the paper only considers the specific linear case above. Which solution is obtained by a discretisation of all variables and  linear programming.