**A** few days ago, Tom Rainforth, Robert Cornish, Hongseok Yang, and Frank Wood from Oxford have arXived a paper on the limitations of nested Monte Carlo. By nested Monte Carlo [not *nested sampling*], they mean Monte Carlo techniques used to evaluate the expectation of a non-linear transform of an expectation, which often call for plug-in resolution. The main result is that this expectation cannot be evaluated by an unbiased estimator. Which is only mildly surprising. I do wonder if there still exist series solutions à la Glynn and Rhee, as in the Russian roulette version. Which is mentioned in a footnote. Or specially tuned versions, as suggested by some techniques found in Devroye’s book where the expectation of the exponential of another expectation is considered… (The paper is quite short, which may be correlated with the format imposed by some machine-learning conference proceedings like AISTATS.)

## Archive for Monte Carlo integration

## pitfalls of nested Monte Carlo

Posted in Books, pictures, Statistics, University life with tags AISTATS, Luc Devroye, Monte Carlo integration, nested sampling, Peter Glynn, Russian roulette on December 19, 2016 by xi'an## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags coupling, cross validated, fleas, Leonhard Euler, Markov chains, Markov random field, Monte Carlo integration, occupancy, Poisson approximation, R, random walk, stationarity on December 8, 2016 by xi'an**A**n old riddle found on X validated asking for Monte Carlo resolution but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){ mean=0 for (t in 1:n){ board=rep(1,900) for (v in 1:T){ beard=rep(0,900) if (board[1]>0){ poz=c(0,1,0,30) ne=rmultinom(1,board[1],prob=(poz!=0)) beard[1+poz]=beard[1+poz]+ne} # for (i in (2:899)[board[-1][-899]>0]){ u=(i-1)%%30+1;v=(i-1)%/%30+1 poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30)) ne=rmultinom(1,board[i],prob=(poz!=0)) beard[i+poz]=beard[i+poz]+ne} # if (board[900]>0){ poz=c(-1,0,-30,0) ne=rmultinom(1,board[900],prob=(poz!=0)) beard[900+poz]=beard[900+poz]+ne} board=beard} mean=mean+sum(board==0)} return(mean/n)}

The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3) [1] 331.082 > 900/exp(1) [1] 331.0915

Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has *no stationary distribution*, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){ return(c(2,rep(3,B-2),2,rep(c(3, rep(4,B-2),3),B-2),2,rep(3,B-2),2))}

namely

> mn=0;n=1e8 #14 clock hours! > proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30) > for (t in 1:n) + mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4] > mn=mn/n > mn[1]=mn[1]-450 > mn 0 1 2 3 166.11 164.92 82.56 27.71 > exprmt(n=1e6) #55 clock hours!! [1] 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)

## Monte Carlo with determinantal processes [reply from the authors]

Posted in Books, Statistics with tags Bayesian inference, central limit theorem, determinantal point processes, Gaussian processes, Gaussian quadrature, hypercube, Lebesgue measure, Monte Carlo integration, Monte Carlo Statistical Methods, orthogonal polynomials, probabilistic numerics, quasi-Monte Carlo methods, super-efficiency, Terence Tao, zero variance importance sampling on September 22, 2016 by xi'an*[Rémi Bardenet and Adrien Hardy have written a reply to my comments of today on their paper, which is more readable as a post than as comments, so here it is. I appreciate the intention, as well as the perfect editing of the reply, suited for a direct posting!]*

**T**hanks for your comments, Xian. As a foreword, a few people we met also had the intuition that DPPs would be relevant for Monte Carlo, but no result so far was backing this claim. As it turns out, we had to work hard to prove a CLT for importance-reweighted DPPs, using some deep recent results on orthogonal polynomials. We are currently working on turning this probabilistic result into practical algorithms. For instance, efficient sampling of DPPs is indeed an important open question, to which most of your comments refer. Although this question is out of the scope of our paper, note however that our results do not depend on how you sample. Efficient sampling of DPPs, along with other natural computational questions, is actually the crux of an ANR grant we just got, so hopefully in a few years we can write a more detailed answer on this blog! We now answer some of your other points.

*“one has to examine the conditions for the result to operate, from the support being within the unit hypercube,”*

Any compactly supported measure would do, using dilations, for instance. Note that we don’t assume the support is the whole hypercube.

*“to the existence of N orthogonal polynomials wrt the dominating measure, not discussed here”*

As explained in Section 2.1.2, it is enough that the reference measure charges some open set of the hypercube, which is for instance the case if it has a density with respect to the Lebesgue measure.

*“to the lack of relation between the point process and the integrand,”*

Actually, our method depends heavily on the target measure μ. Unlike vanilla QMC, the repulsiveness between the quadrature nodes is tailored to the integration problem.

*“changing N requires a new simulation of the entire vector unless I missed the point.”*

You’re absolutely right. This is a well-known open issue in probability, see the discussion on Terence Tao’s blog.

*“This requires figuring out the upper bounds on the acceptance ratios, a “problem-dependent” request that may prove impossible to implement”*

We agree that in general this isn’t trivial. However, good bounds are available for all Jacobi polynomials, see Section 3.

*“Even without this stumbling block, generating the N-sized sample for dimension d=N (why d=N, I wonder?)”*

This is a misunderstanding: we do not say that d=N in any sense. We only say that sampling from a DPP using the algorithm of [Hough et al] requires the same number of operations as orthonormalizing N vectors of dimension N, hence the cubic cost.

*1. “how does it relate to quasi-Monte Carlo?”*

So far, the connection to QMC is only intuitive: both rely on well-spaced nodes, but using different mathematical tools.

*2. “the marginals of the N-th order determinantal process are far from uniform (see Fig. 1), and seemingly concentrated on the boundaries”*

This phenomenon is due to orthogonal polynomials. We are investigating more general constructions that give more flexibility.

*3. “Is the variance of the resulting estimator (2.11) always finite?”*

Yes. For instance, this follows from the inequality below (5.56) since ƒ(x)/K(x,x) is Lipschitz.

*4. and 5.* We are investigating concentration inequalities to answer these points.

*6. “probabilistic numerics produce an epistemic assessment of uncertainty, contrary to the current proposal.”*

A partial answer may be our Remark 2.12. You can interpret DPPs as putting a Gaussian process prior over ƒ and sequentially sampling from the posterior variance of the GP.

## Monte Carlo with determinantal processes

Posted in Books, Statistics with tags central limit theorem, determinantal point processes, Gaussian quadrature, Monte Carlo integration, orthogonal polynomials, probabilistic numerics, quasi-Monte Carlo methods, super-efficiency, zero variance importance sampling on September 21, 2016 by xi'an**R**émi Bardenet and Adrien Hardy have arXived this paper a few months ago but I was a bit daunted by the sheer size of the paper, until I found the perfect opportunity last week..! The approach relates to probabilistic numerics as well as Monte Carlo, in that it can be seen as a stochastic version of Gaussian quadrature. The authors mention in the early pages a striking and recent result by Delyon and Portier that using an importance weight where the sampling density is replaced with the leave-one-out kernel estimate produces faster convergence than the regular Monte Carlo √n! Which reminds me of quasi-Monte Carlo of course, discussed in the following section (§1.3), with the interesting [and new to me] comment that the theoretical rate (and thus the improvement) does not occur until the sample size N is exponential in the dimension. Bardenet and Hardy achieve similar super-efficient convergence by mixing quadrature with repulsive simulation. For almost every integrable function.

The fact that determinantal point processes (on the unit hypercube) and Gaussian quadrature methods are connected is not that surprising once one considers that such processes are associated with densities made of determinants, which matrices are kernel-based, K(x,y), with K expressed as a sum of orthogonal polynomials. An N-th order determinantal process in dimension d satisfies a generalised Central Limit Theorem in that the speed of convergence is

which means faster than √N… This is more surprising, of course, even though one has to examine the conditions Continue reading

## The answer is e, what was the question?!

Posted in Books, R, Statistics with tags Buffon's needle, cross validated, Gnedenko, Monte Carlo integration, Poisson process, simulation on February 12, 2016 by xi'an**A** rather exotic question on X validated: *since π can be approximated by random sampling over a unit square, is there an equivalent for approximating e?* This is an interesting question, as, indeed, why not focus on *e* rather than* π *after all?! But very quickly the very artificiality of the problem comes back to hit one in one’s face… With no restriction, it is straightforward to think of a Monte Carlo average that converges to *e* as the number of simulations grows to infinity. However, such methods like Poisson and normal simulations require some complex functions like sine, cosine, or exponential… But then someone came up with a connection to the great Russian probabilist Gnedenko, who gave as an exercise that the average number of uniforms one needs to add to exceed 1 is exactly *e*, because it writes as

(The result was later detailed in the American Statistician as an introductory simulation exercise akin to Buffon’s needle.) This is a brilliant solution as it does not involve anything but a standard uniform generator. I do not think it relates in any close way to the generation from a Poisson process with parameter λ=1 where the probability to exceed one in one step is *e*⁻¹, hence deriving a Geometric variable from this process leads to an unbiased estimator of *e* as well. As an aside, W. Huber proposed the following elegantly concise line of R code to implement an approximation of *e*:

`1/mean(n*diff(sort(runif(n+1))) > 1)`

Hard to beat, isn’t it?! (Although it is more exactly a Monte Carlo approximation of

which adds a further level of approximation to the solution….)

## Je reviendrai à Montréal [D-2]

Posted in pictures, Statistics, Travel, University life with tags ABC, ABC in Montréal, Approximate Bayesian computation, Bayesian inference, Canada, London, MCMC, Monte Carlo integration, Monte Carlo Statistical Methods, Montréal, NIPS, NIPS 2015, probabilistic numerics, Robert Charlebois, scalability on December 9, 2015 by xi'an**I** have spent the day and more completing and compiling slides for my contrapuntal perspective on probabilistic numerics, back in Montréal, for the NIPS 2015 workshop of December 11 on this theme. As I presume the kind invitation by the organisers was connected with my somewhat critical posts on the topic, I mostly The day after, while I am flying back to London for the CFE (Computational and Financial Econometrics) workshop, somewhat reluctantly as there will be another NIPS workshop that day on scalable Monte Carlo.

Je veux revoir le long désert

Des rues qui n’en finissent pas

Qui vont jusqu’au bout de l’hiver

Sans qu’il y ait trace de pas

## Je reviendrai à Montréal [NIPS 2015]

Posted in pictures, Statistics, Travel, University life with tags ABC, ABC in Montréal, Approximate Bayesian computation, Bayesian inference, Canada, MCMC, Monte Carlo integration, Monte Carlo Statistical Methods, Montréal, NIPS, NIPS 2015, probabilistic numerics, Québec, Robert Charlebois, scalability on September 30, 2015 by xi'an**I** will be back in Montréal, as the song by Robert Charlebois goes, for the NIPS 2015 meeting there, more precisely for the workshops of December 11 and 12, 2015, on probabilistic numerics and ABC [à Montréal]. I was invited to give the first talk by the organisers of the NIPS workshop on probabilistic numerics, presumably to present a contrapuntal perspective on this mix of Bayesian inference with numerical issues, following my somewhat critical posts on the topic. And I also plan to attend some lectures in the (second) NIPS workshop on ABC methods. Which does not leave much free space for yet another workshop on Approximate Bayesian Inference! The day after, while I am flying back to London, there will be a workshop on scalable Monte Carlo. All workshops are calling for contributed papers to be presented during central poster sessions. To be submitted to abcinmontreal@gmail.com and to probnum@gmail.com and to aabi2015. Before October 16.

Funny enough, I got a joking email from Brad, bemoaning my traitorous participation to the workshop on probabilistic numerics because of its “anti-MCMC” agenda, reflected in the summary:

“Integration is the central numerical operation required for Bayesian machine learning (in the form of marginalization and conditioning). Sampling algorithms still abound in this area, although it has long been known that Monte Carlo methods are fundamentally sub-optimal. The challenges for the development of better performing integration methods are mostly algorithmic. Moreover, recent algorithms have begun to outperform MCMC and its siblings, in wall-clock time, on realistic problems from machine learning.

The workshop will review the existing, by now quite strong, theoretical case against the use of random numbers for integration, discuss recent algorithmic developments, relationships between conceptual approaches, and highlight central research challenges going forward.”

Position that I hope to water down in my talk! In any case,

Je veux revoir le long désert

Des rues qui n’en finissent pas

Qui vont jusqu’au bout de l’hiver

Sans qu’il y ait trace de pas