Archive for quasi-Monte Carlo methods

Monte Carlo with determinantal processes [reply from the authors]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on September 22, 2016 by xi'an

[Rémi Bardenet and Adrien Hardy have written a reply to my comments of today on their paper, which is more readable as a post than as comments, so here it is. I appreciate the intention, as well as the perfect editing of the reply, suited for a direct posting!]

Thanks for your comments, Xian. As a foreword, a few people we met also had the intuition that DPPs would be relevant for Monte Carlo, but no result so far was backing this claim. As it turns out, we had to work hard to prove a CLT for importance-reweighted DPPs, using some deep recent results on orthogonal polynomials. We are currently working on turning this probabilistic result into practical algorithms. For instance, efficient sampling of DPPs is indeed an important open question, to which most of your comments refer. Although this question is out of the scope of our paper, note however that our results do not depend on how you sample. Efficient sampling of DPPs, along with other natural computational questions, is actually the crux of an ANR grant we just got, so hopefully in a few years we can write a more detailed answer on this blog! We now answer some of your other points.

“one has to examine the conditions for the result to operate, from the support being within the unit hypercube,”
Any compactly supported measure would do, using dilations, for instance. Note that we don’t assume the support is the whole hypercube.

“to the existence of N orthogonal polynomials wrt the dominating measure, not discussed here”
As explained in Section 2.1.2, it is enough that the reference measure charges some open set of the hypercube, which is for instance the case if it has a density with respect to the Lebesgue measure.

“to the lack of relation between the point process and the integrand,”
Actually, our method depends heavily on the target measure μ. Unlike vanilla QMC, the repulsiveness between the quadrature nodes is tailored to the integration problem.

“changing N requires a new simulation of the entire vector unless I missed the point.”
You’re absolutely right. This is a well-known open issue in probability, see the discussion on Terence Tao’s blog.

“This requires figuring out the upper bounds on the acceptance ratios, a “problem-dependent” request that may prove impossible to implement”
We agree that in general this isn’t trivial. However, good bounds are available for all Jacobi polynomials, see Section 3.

“Even without this stumbling block, generating the N-sized sample for dimension d=N (why d=N, I wonder?)”
This is a misunderstanding: we do not say that d=N in any sense. We only say that sampling from a DPP using the algorithm of [Hough et al] requires the same number of operations as orthonormalizing N vectors of dimension N, hence the cubic cost.

1. “how does it relate to quasi-Monte Carlo?”
So far, the connection to QMC is only intuitive: both rely on well-spaced nodes, but using different mathematical tools.

2. “the marginals of the N-th order determinantal process are far from uniform (see Fig. 1), and seemingly concentrated on the boundaries”
This phenomenon is due to orthogonal polynomials. We are investigating more general constructions that give more flexibility.

3. “Is the variance of the resulting estimator (2.11) always finite?”
Yes. For instance, this follows from the inequality below (5.56) since ƒ(x)/K(x,x) is Lipschitz.

4. and 5. We are investigating concentration inequalities to answer these points.

6. “probabilistic numerics produce an epistemic assessment of uncertainty, contrary to the current proposal.”
A partial answer may be our Remark 2.12. You can interpret DPPs as putting a Gaussian process prior over ƒ and sequentially sampling from the posterior variance of the GP.

Monte Carlo with determinantal processes

Posted in Books, Statistics with tags , , , , , , , , on September 21, 2016 by xi'an

Rémi Bardenet and Adrien Hardy have arXived this paper a few months ago but I was a bit daunted by the sheer size of the paper, until I found the perfect opportunity last week..! The approach relates to probabilistic numerics as well as Monte Carlo, in that it can be seen as a stochastic version of Gaussian quadrature. The authors mention in the early pages a striking and recent result by Delyon and Portier that using an importance weight where the sampling density is replaced with the leave-one-out kernel estimate produces faster convergence than the regular Monte Carlo √n! Which reminds me of quasi-Monte Carlo of course, discussed in the following section (§1.3), with the interesting [and new to me] comment that the theoretical rate (and thus the improvement) does not occur until the sample size N is exponential in the dimension. Bardenet and Hardy achieve similar super-efficient convergence by mixing quadrature with repulsive simulation. For almost every integrable function.

The fact that determinantal point processes (on the unit hypercube) and Gaussian quadrature methods are connected is not that surprising once one considers that such processes are associated with densities made of determinants, which matrices are kernel-based, K(x,y), with K expressed as a sum of orthogonal polynomials. An N-th order determinantal process in dimension d satisfies a generalised Central Limit Theorem in that the speed of convergence is


which means faster than √N…  This is more surprising, of course, even though one has to examine the conditions Continue reading

MCqMC 2016 [#4]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 21, 2016 by xi'an

In his plenary talk this morning, Arnaud Doucet discussed the application of pseudo-marginal techniques to the latent variable models he has been investigating for many years. And its limiting behaviour towards efficiency, with the idea of introducing correlation in the estimation of the likelihood ratio. Reducing complexity from O(T²) to O(T√T). With the very surprising conclusion that the correlation must go to 1 at a precise rate to get this reduction, since perfect correlation would induce a bias. A massive piece of work, indeed!

The next session of the morning was another instance of conflicting talks and I hoped from one room to the next to listen to Hani Doss’s empirical Bayes estimation with intractable constants (where maybe SAME could be of interest), Youssef Marzouk’s transport maps for MCMC, which sounds like an attractive idea provided the construction of the map remains manageable, and Paul Russel’s adaptive importance sampling that somehow sounded connected with our population Monte Carlo approach. (With the additional step of considering transform maps.)

An interesting item of information I got from the final announcements at MCqMC 2016 just before heading to Monash, Melbourne, is that MCqMC 2018 will take place in the city of Rennes, Brittany, on July 2-6. Not only it is a nice location on its own, but it is most conveniently located in space and time to attend ISBA 2018 in Edinburgh the week after! Just moving from one Celtic city to another Celtic city. Along with other planned satellite workshops, this occurrence should make ISBA 2018 more attractive [if need be!] for participants from oversea.

MCqMC 2016 [#2]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on August 17, 2016 by xi'an

In her plenary talk this morning, Christine Lemieux discussed connections between quasi-Monte Carlo and copulas, covering a question I have been considering for a while. Namely, when provided with a (multivariate) joint cdf F, is there a generic way to invert a vector of uniforms [or quasi-uniforms] into a simulation from F? For Archimedian copulas (as we always can get back to copulas), there is a resolution by the Marshall-Olkin representation,  but this puts a restriction on the distributions F that can be considered. The session on synthetic likelihoods [as introduced by Simon Wood in 2010] put together by Scott Sisson was completely focussed on using normal approximations for the distribution of the vector of summary statistics, rather than the standard ABC non-parametric approximation. While there is a clear (?) advantage in using a normal pseudo-likelihood, since it stabilises with much less simulations than a non-parametric version, I find it difficult to compare both approaches, as they lead to different posterior distributions. In particular, I wonder at the impact of the dimension of the summary statistics on the approximation, in the sense that it is less and less likely that the joint is normal as this dimension increases. Whether this is damaging for the resulting inference is another issue, possibly handled by a supplementary ABC step that would take the first-step estimate as summary statistic. (As a side remark, I am intrigued at everyone being so concerned with unbiasedness of methods that are approximations with no assessment of the amount of approximation!) The last session of the day was about multimodality and MCMC solutions, with talks by Hyungsuk Tak, Pierre Jacob and Babak Shababa, plus mine. Hunsuk presented the RAM algorithm I discussed earlier under the title of “love-hate” algorithm, which was a kind reference to my post! (I remain puzzled by the ability of the algorithm to jump to another mode, given that the intermediary step aims at a low or even zero probability region with an infinite mass target.) And Pierre talked about using SMC for Wang-Landau algorithms, with a twist to the classical stochastic optimisation schedule that preserves convergence. And a terrific illustration on a distribution inspired from the Golden Gate Bridge that reminded me of my recent crossing! The discussion around my folded Markov chain talk focussed on the extension of the partition to more than two sets, the difficulty being in generating automated projections, with comments about connections with computer graphic tools. (Too bad that the parallel session saw talks by Mark Huber and Rémi Bardenet that I missed! Enjoying a terrific Burmese dinner with Rémi, Pierre and other friends also meant I could not post this entry on time for the customary 00:16. Not that it matters in the least…)

Rémi Bardenet’s seminar

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on April 7, 2016 by xi'an

Grand Palais from Esplanade des Invalides, Paris, Dec. 07, 2012Next week, Rémi Bardenet is giving a seminar in Paris, Thursday April 14, 2pm, in ENSAE [room 15] on MCMC methods for tall data. Unfortunately, I will miss this opportunity to discuss with Rémi as I will be heading to La Sapienza, Roma, for Clara Grazian‘s PhD defence the next day.  And on Monday afternoon, April 11, Nicolas Chopin will give a talk on quasi-Monte Carlo for sequential problems at Institut Henri Poincaré.

MCMskv #5 [future with a view]

Posted in Kids, Mountains, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on January 12, 2016 by xi'an

As I am flying back to Paris (with an afternoon committee meeting in München in-between), I am reminiscing on the superlative scientific quality of this MCMski meeting, on the novel directions in computational Bayesian statistics exhibited therein, and on the potential settings for the next meeting. If any.

First, as hopefully obvious from my previous entries, I found the scientific program very exciting, with almost uniformly terrific talks, and a coverage of the field of computational Bayesian statistics that is perfectly tuned to my own interest. In that sense, MCMski is my “top one” conference! Even without considering the idyllic location. While some of the talks were about papers I had already read (and commented here), others brought new vistas and ideas. If one theme is to emerge from this meeting it has to be the one of approximate and noisy algorithms, with a wide variety of solutions and approaches to overcome complexity issues. If anything, I wish the solutions would also incorporate the Boxian fact that the statistical models themselves are approximate. Overall, a fantastic program (says one member of the scientific committee).

Second, as with previous MCMski meetings, I again enjoyed the unique ambience of the meeting, which always feels more relaxed and friendly than other conferences of a similar size, maybe because of the après-ski atmosphere or of the special coziness provided by luxurious mountain hotels. This year hotel was particularly pleasant, with non-guests like myself able to partake of some of their facilities. A big thank you to Anto for arranging so meticulously all the details of such a large meeting!!! I am even more grateful when realising this is the third time Anto takes over the heavy load of organising MCMski. Grazie mille!

Since this is a [and even the!] BayesComp conference, the current section program chair and board must decide on the  structure and schedule of the next meeting. A few suggestions if I may: I would scrap entirely the name MCMski from the next conference as (a) it may sound like academic tourism for unaware bystanders (who only need to check the program of any of the MCMski conferences to stand reassured!) and (b) its topic go way beyond MCMC. Given the large attendance and equally large proportion of young researchers, I would also advise against hosting the conference in a ski resort for both cost and accessibility reasons [as we had already discussed after MCMskiv], in favour of a large enough town to offer a reasonable range of accommodations and of travel options. Like Chamonix, Innsbruck, Reykjavik, or any place with a major airport about one hour away… If nothing is available with skiing possibilities, so be it! While the outdoor inclinations of the early organisers induced us to pick locations where skiing over lunch break was a perk, any accessible location that allows for a concentration of researchers in a small area and for the ensuing day-long exchange is fine! Among the novelties in the program, the tutorials and the Breaking news! sessions were quite successful (says one member of the scientific committee). And should be continued in one format or another. Maybe a more programming thread could be added as well… And as we had mentioned earlier, to see a stronger involvement of the Young Bayesian section in the program would be great! (Even though the current meeting already had many young researcher  talks.)

MCMskv #4 [house with a vision]

Posted in Statistics with tags , , , , , , , , , , , , on January 9, 2016 by xi'an

OLYMPUS DIGITAL CAMERALast day at MCMskv! Not yet exhausted by this exciting conference, but this was the toughest day with one more session and a tutorial by Art Own on quasi Monte-Carlo. (Not even mentioning the night activities that I skipped. Or the ski break that I did not even consider.) Krys Latunszynski started with a plenary on exact methods for discretised diffusions, with a foray in Bernoulli factory problems. Then a neat session on adaptive MCMC methods that contained a talk by Chris Sherlock on delayed acceptance, where the approximation to the target was built by knn trees. (The adaptation was through the construction of the tree by including additional evaluations of the target density. Another paper sitting in my to-read list for too a long while: the exploitation of the observed values of π towards improving an MCMC sampler has always be “obvious” to me even though I could not see any practical way of doing so. )

It was wonderful that Art Owen accepted to deliver a tutorial at MCMskv on quasi-random Monte Carlo. Great tutorial, with a neat coverage of the issues most related to Monte Carlo integration. Since quasi-random sequences have trouble with accept/reject methods, a not-even-half-baked idea that came to me during Art’s tutorial was that the increased computing power granted by qMC could lead to a generic integration of the Metropolis-Hastings step in a Rao-Blackwellised manner. Art mentioned he was hoping that in a near future one could switch between pseudo- and quasi-random in an almost automated manner when running standard platforms like R. This would indeed be great, especially since quasi-random sequences seem to be available at the same cost as their pseudo-random counterpart. During the following qMC session, Art discussed the construction of optimal sequences on sets other than hypercubes (with the surprising feature that projecting optimal sequences from the hypercube does not work). Mathieu Gerber presented the quasi-random simulated annealing algorithm he developed with Luke Bornn that I briefly discussed a while ago. Or thought I did as I cannot trace a post on that paper! While the fact that annealing also works with quasi-random sequences is not astounding, the gain over random sequences shown on two examples is clear. The session also had a talk by Lester Mckey who relies Stein’s discrepancy to measure the value of an approximation to the true target. This was quite novel, with a surprising connection to Chris Oates’ talk and the use of score-based control variates, if used in a dual approach.

Another great session was the noisy MCMC one organised by Paul Jenkins (Warwick), with again a coherent presentation of views on the quality or lack thereof of noisy (or inexact) versions, with an update from Richard Everitt on inexact MCMC, Felipe Medina Aguayo (Warwick) on sufficient conditions for noisy versions to converge (and counterexamples), Jere Koskela (Warwick) on a pseudo-likelihood approach to the highly complex Kingman’s coalescent model in population genetics (of ABC fame!), and Rémi Bardenet on the tall data approximations techniques discussed in a recent post. Having seen or read most of those results previously did not diminish the appeal of the session.