Archive for arXiv

living on the edge [of the canal]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on December 15, 2021 by xi'an

Last month, Roberto Casarin, Radu Craiu, Lorenzo Frattarolo and myself posted an arXiv paper on a unified approach to antithetic sampling. To which I mostly and modestly contributed while visiting Roberto in Venezia two years ago (although it seems much farther than that!). I have always found antithetic sampling fascinating, albeit mostly unachievable in realistic situations, except (and approximately) by quasi-random tools. The original approach dates back to Hammersley and Morton, circa 1956, when they optimally couple X=F⁻(U) and Y=F⁻(1-U), with U Uniform, although there is no clear-cut extension beyond pairs or above dimension one. While the search for optimal and feasible antithetic plans dried out in the mid-1980’s, despite near successes by Rubinstein and others, the focus switched to Latin hypercube sampling.

The construction of a general antithetic sampling scheme is based on sampling uniformly an edge within an undirected graph in the d-dimensional hypercube, under some (three) assumptions on the edges to achieve uniformity for the marginals. This construction achieves the smallest Kullback-Leibler divergence between the resulting joint and the product of uniforms. And it can be furthermore constrained to be d-countermonotonic, ie such that a non-linear sum of the components is constant. We also show that the proposal leads to closed-form Kendall’s τ and Spearman’s ρ. Which can be used to assess different d-countermonotonic schemes, incl. earlier ones found in the literature. The antithetic sampling proposal can be applied in Monte Carlo, Markov chain Monte Carlo, and sequential Monte Carlo settings. In a stochastic volatility example of the later (SMC) we achieve performances similar to the quasi-Monte Carlo approach of Mathieu Gerber and Nicolas Chopin.

GANs as density estimators

Posted in Books, Statistics with tags , , , , , , , on October 15, 2021 by xi'an

I recently read an arXival entitled Conditional Sampling With Monotone GAN by Kovakchi et al., who construct  a mapping T that transforms or pushes forward a reference measure þ() like a multivariate Normal distribution to a target conditional distribution ð(dθ|x).  Which makes the proposal a type of normalising flow, except it does not require a Jacobian derivation… The mapping T is monotonous and block triangular in order to be invertible. It is learned from data by minimising a functional divergence between Tþ(dθ) and ð(dθ|x), for instance GAN least square or GAN Wasserstein penalties and representing T as a neural network.  Where monotonicity is imposed by a Lagrangian. The authors “note that global minimizers of [their GAN criterion] can also be used for conditional density estimation” but I fail to understand the distinction in that once T is constructed, the estimated conditional density is automatically available. However my main source of puzzlement is at the worth of this construction, since it does not provide an exact generative process for the conditional distribution, while requiring many generations from the joint distribution. Rather than a comparison with MCMC, which is not applicable in untractable generative models, a comparison with less expensive ABC solutions would have been appropriate, I think. And the paper is missing any quantification on the quality or asymptotics of the density estimate provided by this involved approximation, as most of the recent literature on normalising flows and friends. (A point acknowledged by the authors in the supplementary material section.)

“In this regard, the MGANs approach introduced in the article belongs to the category of sampling techniques such as MCMC, whose goal is to generate independent samples from the law of y|x, as opposed to assuming some structural form of the probability measure directly.”

I am unsure I understand the above remark as MCMC methods are intrinsically linked with the exact probability distribution, exploiting either some conditional representations as in Gibbs or at the very least the ability to compute the joint density…


thou shalt not arXiv

Posted in Statistics with tags , , , , , , on September 29, 2021 by xi'an

Just heard of the Australian Research Council (ARC) rejecting applications mentioning arXivals and other preprints! Which means the applicants had lost a unique opportunity to get funding. Quite ridiculous when considering frontier research and peculiar (the European Research Council definitely allows for preprints). It now seems the ARC has backtracked on this rule, but the rejected applications may remain rejected…


Posted in Statistics with tags , , , , , , , on August 18, 2021 by xi'an

Our survey paper on Rao-Blackwellisation (and the first Robert&Roberts published paper!) just appeared on-line as part of the International Statistical Review mini-issue in honour of C.R. Rao on the occasion of his 100th birthday. (With an unfortunate omission of my affiliation with Warwick!). While the papers are unfortunately beyond a paywall, except for a few weeks!, the arXiv version is still available (and presumably with less typos!).

ten computer codes that transformed science

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on April 23, 2021 by xi'an

In a “Feature” article of 21 January 2021, Nature goes over a poll on “software tools that have had a big impact on the world of science”. Among those,

the Fortran compiler (1957), which is one of the first symbolic languages, developed by IBM. This is the first computer language I learned (in 1982) and one of the two (with SAS) I ever coded on punch cards for the massive computers of INSEE. I quickly and enthusiastically switched to Pascal (and the Apple IIe) the year after and despite an attempt at moving to C, I alas kept the Pascal programming style in my subsequent C codes (until I gave up in the early 2000’s!). Moving to R full time, even though I had been using Splus since a Unix version was produced. Interestingly, a later survey of Nature readers put R at the top of the list of what should have been included!, incidentally including Monte Carlo algorithms into the list (and I did not vote in that poll!),

the fast Fourier transform (1965), co-introduced by John Tukey, but which I never ever used (or at least knowingly!),

arXiv (1991), which was started as an emailed preprint list by Paul Ginsparg at Los Alamos, getting the current name by 1998, and where I only started publishing (or arXiving) in 2007, perhaps because it then sounded difficult to submit a preprint there, perhaps because having a worldwide preprint server sounded more like bother (esp. since we had then to publish our preprints on the local servers) than revolution, perhaps because of a vague worry of being overtaken by others… Anyway, I now see arXiv as the primary outlet for publishing papers, with the possible added features of arXiv-backed journals and Peer Community validations,

the IPython Notebook (2011), by Fernando Pérez, which started by 259 lines of Python code, and turned into Jupyter in 2014. I know nothing about this, but I can relate to the relevance of the project when thinking about Rmarkdown, which I find more and more to be a great way to work on collaborative projects and to teach. And for producing reproducible research. (I do remember writing once a paper in Sweave, but not which one…!)

%d bloggers like this: