simulating correlated random variables [cont’ed]
Following a recent post on the topic, and comments ‘Og’s readers kindly provided on that post, the picture is not as clear as I wished it was… Indeed, on the one hand, non-parametric measures of correlation based on ranks are, as pointed out by Clara Grazian and others, invariant under monotonic transforms and hence producing a Gaussian pair or a Uniform pair with the intended rank correlation is sufficient to return a correlated sample for any pair of marginal distributions by the (monotonic) inverse cdf transform. On the other hand, if correlation is understood as Pearson linear correlation, (a) it is not always defined and (b) there does not seem to be a generic approach to simulate from an arbitrary triplet (F,G,ρ) [assuming the three entries are compatible]. When Kees pointed out Pascal van Kooten‘s solution by permutation, I thought this was a terrific resolution, but after thinking about it a wee bit more, I am afraid it is only an approximation, i.e., a way to return a bivariate sample with a given empirical correlation. Not the theoretical correlation. Obviously, when the sample is very large, this comes as a good approximation. But when facing a request to simulate a single pair (X,Y), this gets inefficient [and still approximate].
Now, if we aim at exact simulation from a bivariate distribution with the arbitrary triplet (F,G,ρ), why can’t we find a generic method?! I think one fundamental if obvious reason is that the question is just ill-posed. Indeed, there are many ways of defining a joint distribution with marginals F and G and with (linear) correlation ρ. One for each copula. The joint could thus be associated with a Gaussian copula, i.e., (X,Y)=(F⁻¹(Φ(A)),G⁻¹(Φ(B))) when (A,B) is a standardised bivariate normal with the proper correlation ρ’. Or it can be associated with the Archimedian copula
C(u; v) = (u-θ + v-θ − 1)-1/θ,
with θ>0 defined by a (linear) correlation of ρ. Or yet with any other copula… Were the joint distribution perfectly well-defined, it would then mean that ρ’ or θ (or whatever natural parameter is used for that copula) do perfectly parametrise this distribution instead of the correlation coefficient ρ. All that remains then is to simulate directly from the copula, maybe a theme for a future post…