living on the edge [of the canal]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on December 15, 2021 by xi'an

Last month, Roberto Casarin, Radu Craiu, Lorenzo Frattarolo and myself posted an arXiv paper on a unified approach to antithetic sampling. To which I mostly and modestly contributed while visiting Roberto in Venezia two years ago (although it seems much farther than that!). I have always found antithetic sampling fascinating, albeit mostly unachievable in realistic situations, except (and approximately) by quasi-random tools. The original approach dates back to Hammersley and Morton, circa 1956, when they optimally couple X=F⁻(U) and Y=F⁻(1-U), with U Uniform, although there is no clear-cut extension beyond pairs or above dimension one. While the search for optimal and feasible antithetic plans dried out in the mid-1980’s, despite near successes by Rubinstein and others, the focus switched to Latin hypercube sampling.

The construction of a general antithetic sampling scheme is based on sampling uniformly an edge within an undirected graph in the d-dimensional hypercube, under some (three) assumptions on the edges to achieve uniformity for the marginals. This construction achieves the smallest Kullback-Leibler divergence between the resulting joint and the product of uniforms. And it can be furthermore constrained to be d-countermonotonic, ie such that a non-linear sum of the components is constant. We also show that the proposal leads to closed-form Kendall’s τ and Spearman’s ρ. Which can be used to assess different d-countermonotonic schemes, incl. earlier ones found in the literature. The antithetic sampling proposal can be applied in Monte Carlo, Markov chain Monte Carlo, and sequential Monte Carlo settings. In a stochastic volatility example of the later (SMC) we achieve performances similar to the quasi-Monte Carlo approach of Mathieu Gerber and Nicolas Chopin.

continuous herded Gibbs sampling

Posted in Books, pictures, Statistics with tags , , , , , , , , on June 28, 2021 by xi'an

Read a short paper by Laura Wolf and Marcus Baum on Gibbs herding, where herding is a technique of “deterministic sampling”, for instance selecting points over the support of the distribution by matching exact and empirical (or “empirical”!) moments. Which reminds me of the principal points devised by my late friend Bernhard Flury. With an unclear argument as to why it could take over random sampling:

“random numbers are often generated by pseudo-random number generators, hence are not truly random”

Especially since the aim is to “draw samples from continuous multivariate probability densities.” The sequential construction of such a sample proceeds sequentially by adding a new (T+1)-th point to the existing sample of y’s by maximising in x the discrepancy

$(T+1)\mathbb E^Y[k(x,Y)]-\sum_{t=1}^T k(x,y_t)$

where k(·,·) is a kernel, e.g. a Gaussian density. Hence a complexity that grows as O(T). The current paper suggests using Gibbs “sampling” to update one component of x at a time. Using the conditional version of the above discrepancy. Making the complexity grow as O(dT) in d dimensions.

I remain puzzled by the whole thing as these samples cannot be used as regular random or quasi-random samples. And in particular do not produce unbiased estimators of anything. Obviously. The production of such samples being furthermore computationally costly it is also unclear to me that they could even be used for quick & dirty approximations of a target sample.

QMC at CIRM

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 21, 2020 by xi'an

dropping a point

Posted in Statistics, University life with tags , , , , , , , , on September 8, 2020 by xi'an

“A discussion about whether to drop the initial point came up in the plenary tutorial of Fred Hickernell at MCQMC 2020 about QMCPy software for QMC. The issue has been discussed by the pytorch community , and the scipy community, which are both incorporating QMC methods.”

Art Owen recently arXived a paper entitled On dropping the first Sobol’ point in which he examines the impact of a common practice consisting in skipping the first point of a Sobol’ sequence when using quasi-Monte Carlo. By analogy with the burn-in practice for MCMC that aims at eliminating the biais due to the choice of the starting value. Art’s paper shows that by skipping just this one point the rate of convergence of some QMC estimates may drop by a factor, bringing the rate back to Monte Carlo values! As this applies to randomised scrambled Sobol sequences, this is quite amazing. The explanation centers on the suppression leaving one region of the hypercube unexplored, with an O(n⁻¹) error ensuing.

The above picture from the paper makes the case in a most obvious way: the mean squared error is not decreasing at the same rate for the no-drop and one-drop versions, since they are -3/2 and -1, respectively. The paper further “recommends against using roundnumber sample sizes and thinning QMC points.” Conclusion: QMC is not MC!

ABC by QMC

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on November 5, 2018 by xi'an

A paper by Alexander Buchholz (CREST) and Nicolas Chopin (CREST) on quasi-Monte Carlo methods for ABC is going to appear in the Journal of Computational and Graphical Statistics. I had missed the opportunity when it was posted on arXiv and only became aware of the paper’s contents when I reviewed Alexander’s thesis for the doctoral school. The fact that the parameters are simulated (in ABC) from a prior that is quite generally a standard distribution while the pseudo-observations are simulated from a complex distribution (associated with the intractability of the likelihood function) means that the use of quasi-Monte Carlo sequences is in general only possible for the first part.

The ABC context studied there is close to the original version of ABC rejection scheme [as opposed to SMC and importance versions], the main difference standing with the use of M pseudo-observations instead of one (of the same size as the initial data). This repeated version has been discussed and abandoned in a strict Monte Carlo framework in favor of M=1 as it increases the overall variance, but the paper uses this version to show that the multiplication of pseudo-observations in a quasi-Monte Carlo framework does not increase the variance of the estimator. (Since the variance apparently remains constant when taking into account the generation time of the pseudo-data, we can however dispute the interest of this multiplication, except to produce a constant variance estimator, for some targets, or to be used for convergence assessment.) L The article also covers the bias correction solution of Lee and Latuszyǹski (2014).

Due to the simultaneous presence of pseudo-random and quasi-random sequences in the approximations, the authors use the notion of mixed sequences, for which they extend a one-dimension central limit theorem. The paper focus on the estimation of Z(ε), the normalization constant of the ABC density, ie the predictive probability of accepting a simulation which can be estimated at a speed of O(N⁻¹) where N is the number of QMC simulations, is a wee bit puzzling as I cannot figure the relevance of this constant (function of ε), especially since the result does not seem to generalize directly to other ABC estimators.

A second half of the paper considers a sequential version of ABC, as in ABC-SMC and ABC-PMC, where the proposal distribution is there  based on a Normal mixture with a small number of components, estimated from the (particle) sample of the previous iteration. Even though efficient techniques for estimating this mixture are available, this innovative step requires a calculation time that should be taken into account in the comparisons. The construction of a decreasing sequence of tolerances ε seems also pushed beyond and below what a sequential approach like that of Del Moral, Doucet and Jasra (2012) would produce, it seems with the justification to always prefer the lower tolerances. This is not necessarily the case, as recent articles by Li and Fearnhead (2018a, 2018b) and ours have shown (Frazier et al., 2018). Overall, since ABC methods are large consumers of simulation, it is interesting to see how the contribution of QMC sequences results in the reduction of variance and to hope to see appropriate packages added for standard distributions. However, since the most consuming part of the algorithm is due to the simulation of the pseudo-data, in most cases, it would seem that the most relevant focus should be on QMC add-ons on this part, which may be feasible for models with a huge number of standard auxiliary variables as for instance in population evolution.