Archive for Sobol sequences

dropping a point

Posted in Statistics, University life with tags , , , , , , , , on September 8, 2020 by xi'an

“A discussion about whether to drop the initial point came up in the plenary tutorial of Fred Hickernell at MCQMC 2020 about QMCPy software for QMC. The issue has been discussed by the pytorch community , and the scipy community, which are both incorporating QMC methods.”

Art Owen recently arXived a paper entitled On dropping the first Sobol’ point in which he examines the impact of a common practice consisting in skipping the first point of a Sobol’ sequence when using quasi-Monte Carlo. By analogy with the burn-in practice for MCMC that aims at eliminating the biais due to the choice of the starting value. Art’s paper shows that by skipping just this one point the rate of convergence of some QMC estimates may drop by a factor, bringing the rate back to Monte Carlo values! As this applies to randomised scrambled Sobol sequences, this is quite amazing. The explanation centers on the suppression leaving one region of the hypercube unexplored, with an O(n⁻¹) error ensuing.

The above picture from the paper makes the case in a most obvious way: the mean squared error is not decreasing at the same rate for the no-drop and one-drop versions, since they are -3/2 and -1, respectively. The paper further “recommends against using roundnumber sample sizes and thinning QMC points.” Conclusion: QMC is not MC!

Sobol’s Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on December 10, 2016 by xi'an


The name of Ilya Sobol is familiar to researchers in quasi-Monte Carlo methods for his Sobol’s sequences. I was thus surprised to find in my office a small book entitled The Monte Carlo Method by this author, which is a translation of his 1968 book in Russian. I have no idea how it reached my office and I went to check with the library of Paris-Dauphine around the corner [of my corridor] whether it had been lost: apparently, the library got rid of it among a collection of old books… Now, having read through this 67 pages book (or booklet as Sobol puts it) makes me somewhat agree with the librarians, in that there is nothing of major relevance in this short introduction. It is quite interesting to go through the book and see the basics of simulation principles and Monte Carlo techniques unfolding, from the inverse cdf principle [established by a rather convoluted proof] to importance sampling, but the amount of information is about equivalent to the Wikipedia entry on the topic. From an historical perspective, it is also captivating to see the efforts to connect physical random generators (such as those based on vacuum tube noise) to shift-register pseudo-random generators created by Sobol in 1958. On a Soviet Strela computer.

While Googling the title of that book could not provide any connection, I found out that a 1994 version had been published under the title of A Primer for the Monte Carlo Method, which is mostly the same as my version, except for a few additional sections on pseudo-random generation, from the congruential method (with a FORTRAN code) to the accept-reject method being then called von Neumann’s instead of Neyman’s, to the notion of constructive dimension of a simulation technique, which amounts to demarginalisation, to quasi-Monte Carlo [for three pages]. A funny side note is that the author notes in the preface that the first translation [now in my office] was published without his permission!

MCqMC 2014 [day #4]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on April 11, 2014 by xi'an


I hesitated in changing the above title for “MCqMSmaug” as the plenary talk I attended this morning was given by Wenzel Jakob, who uses Markov chain Monte Carlo methods in image rendering and light simulation. The talk was low-tech’, with plenty of pictures and animations (incl. excerpts from recent blockbusters!), but it stressed how much proper rending relies on powerful MCMC techniques. One point particularly attracted my attention, namely the notion of manifold exploration as it seemed related to my zero measure recent post. (A related video is available on Jakob’s webpage.) You may then wonder where the connection with Smaug could be found: Wenzel Jakob is listed in the credits of both Hobbit movies for his contributions to the visual effects! (Hey, MCMC made Smaug [visual effects the way they are], a cool argument for selling your next MCMC course! I will for sure include a picture of Smaug in my next R class presentation…) The next sessions of the morning opposed Sobol’s memorial to more technical light rendering and I chose Sobol, esp. because I had missed Art Owen’s tutorial on Sunday, as he gave a short presentation on using Sobol’s criteria to identify variables contributing the most to the variability or extreme values of a function, an extreme value kind of ANOVA, most interesting if far from my simulation area… The afternoon sessions saw MCMC talks by Luke Bornn and Scott Schmidler, both having connection with the Wang-Landau algorithm. Actually, Scott’s talk was the one generating the most animated discussion among all those I attended in MCqMC! (To the point of the chairman getting rather rudely making faces…)

accelerated ABC

Posted in Books, Mountains, Statistics with tags , , , , , , on January 14, 2014 by xi'an

Richard Wilkinson arXived a paper on accelerated ABC during MCMSki 4, paper that I almost missed when quickly perusing the daily list. This is another illustration of the “invasion of Gaussian processes” in ABC settings. Maybe under the influence of machine learning.

The paper starts with a link to the synthetic likelihood approximation of Wood (2010, Nature), as in Richard Everitt’s talk last week. Richard (W.) presents the generalised ABC as a kernel-based acceptance probability, using a kernel π(y|x), when y is the observed data and x=x(θ) the simulated one. He proposes a Gaussian process modelling for the log-likelihood (at the observed data y), with a quadratic (in θ) mean and Matérn covariance matrix. Hence the connection with Wood’s synthetic likelihood. Another connection is with Nicolas’ talk on QMC(MC): the θ’s are chosen following a Sobol sequence “in order to minimize the number of design points”. Which requires a reparameterisation to [0,1]p… I find this “uniform” exploration of the whole parameter space delicate to envision in complex parameter spaces and realistic problems, since the likelihood is highly concentrated on a tiny subregion of the original [0,1]p. Not mentioning the issue of the spurious mass on the boundaries of the hypercube possibly induced by the change of variable. The sequential algorithm of Richard also attempts at eliminating implausible zones of the parameter space. i.e. zones where the likelihood is essentially zero. My worries with this interesting notion are that (a) the early Gaussian process approximations may be poor and hence exclude zones they should not; (b) all Gaussian process approximations at all iterations must be saved; (c) the Sobol sequences apply to the whole [0,1]p at each iteration but the non-implausible region shrinks at each iteration, which induces a growing inefficiency in the algorithm. The Sobol sequence should be restricted to the previous non-implausible zone.

Overall, an interesting proposal that would need more prodding to understand whether or not it is robust to poor initialisation and complex structures. And a proposal belonging to the estimated likelihood branch of ABC, which makes use of the final Gaussian process approximation to run an MCM algorithm. Without returning to pseudo-data simulation, replacing it with log-likelihood simulation.

“These algorithms sample space randomly and naively and do not learn from previous simulations”

The above criticism is moderated in a footnote about ABC-SMC using the “current parameter value to determine which move to make next [but] parameters visited in previous iterations are not taken into account”. I still find it excessive in that SMC algorithms and in particular ABC-SMC algorithms are completely free to use the whole past to build the new proposal. This was clearly enunciated in our earlier population Monte Carlo papers. For instance, the complete collection of past particles can be recycled by weights computing thru our AMIS algorithm, as illustrated by Jukka Corander in one genetics application.