## discussions on Gerber and Chopin

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on May 29, 2015 by xi'an

As a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me [I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]:

1. Mike Pitt (Warwick), Murray Pollock et al.  (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
2. Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
3. Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
4. Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
5. Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
6. Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
7. Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
8. Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
9. Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
10. Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).

The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.

## Quasi-Monte Carlo sampling

Posted in Books, Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on December 10, 2014 by xi'an

“The QMC algorithm forces us to write any simulation as an explicit function of uniform samples.” (p.8)

As posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper.  Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of Jan 2, 2015).

Quasi-Monte Carlo methods are definitely not popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).

“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)

At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators and error rates of order

$\mathfrak{O}(N^{-1}\log(N)^{d-}) \text{ at cost } \mathfrak{O}(N\log(N))$

means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not always used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a deterministic function of a fixed number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step had to appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms.  For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform

$\mathbf{x}_t^n = \Gamma_t(\mathbf{x}_{t-1}^n,\mathbf{u}_{t}^n)$

which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal

$m_t(\tilde{\mathbf{x}}_{t-1}^n,\cdot)$

has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.

The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only Hm is used.

Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?

In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?

What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?

Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?

## a probabilistic proof to a quasi-Monte Carlo lemma

Posted in Books, Statistics, Travel, University life with tags , , , , , on November 17, 2014 by xi'an

As I was reading in the Paris métro a new textbook on Quasi-Monte Carlo methods, Introduction to Quasi-Monte Carlo Integration and Applications, written by Gunther Leobacher and Friedrich Pillichshammer, I came upon the lemma that, given two sequences on (0,1) such that, for all i’s,

$|u_i-v_i|\le\delta\quad\text{then}\quad\left|\prod_{i=1}^s u_i-\prod_{i=1}^s v_i\right|\le 1-(1-\delta)^s$

and the geometric bound made me wonder if there was an easy probabilistic proof to this inequality. Rather than the algebraic proof contained in the book. Unsurprisingly, there is one based on associating with each pair (u,v) a pair of independent events (A,B) such that, for all i’s,

$A_i\subset B_i\,,\ u_i=\mathbb{P}(A_i)\,,\ v_i=\mathbb{P}(B_i)$

and representing

$\left|\prod_{i=1}^s u_i-\prod_{i=1}^s v_i\right| = \mathbb{P}(\cap_{i=1}^s A_i) - \mathbb{P}(\cap_{i=1}^s B_i)\,.$

Obviously, there is no visible consequence to this remark, but it was a good way to switch off the métro hassle for a while! (The book is under review and the review will hopefully be posted on the ‘Og as soon as it is completed.)

## MCQMC2014 in Belgium

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , on October 11, 2013 by xi'an

The conference MCQMC2014 (which stands for Monte Carlo and Quasi-Monte Carlo) will take place in Leuven, Belgium, on April 6-11. More exactly, in the Katholieke Universiteit Leuven, which is the Flemish-speaking side of the split (1968) Catholic University of Leuven, the French speaking side Université Catholique de Louvain being located in Louvain-la-Neuve. After missing MCQMC2012 in Sydney,  I will attend this conference where I give an invited talk (on ABC, what else..?!). As it happens (and kind of logically), I have visited Louvain-la-Neuve many times, especially in the previous era where historical Bayesians Michel Mouchard, Jean-Marie Rolin and Léopold Simar were together in the Statistics department—two of them contributing to the highly formalised “Elements of Bayesian Statistics” that I perused during my PhD thesis in Rouen—, but I have never been to Leuven or to KU Leuven,

A great item of news is that one of the two tutorials (on April 6, 2014) will given by Art Owen, the theme being “ANOVA, global sensitivity, Sobol’ indices and all that“, The second tutorial is by Mike Giles (Oxford) on his approach of multi-level Monte Carlo methods. (If the organisers follow the MCQMC2012 trend, the Sunday afternoon tutorials should follow one another.)

## 9th IMACS seminar on Monte Carlo Methods, Annecy

Posted in Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , on July 18, 2013 by xi'an

As astute ‘Og’s readers may have gathered (!), I am now in Annecy, Savoie, for the 9th IMACS seminar on Monte Carlo Methods. Where I was kindly invited to give a talk on ABC. IMACS stands for “International Association for Mathematics and Computers in Simulation” and the conference gathers themes and sensibilities I am not familiar with. And very few statisticians. For instance, I attended a stochastic particle session that had nothing to do with my understanding of particle systems (except for Pierre Del Moral’s mean field talk). The overall focus seems to stand much more around SDEs and quasi-Monte Carlo methods. Both items for which I have a genuine interest but little background, so I cannot report much on the talks I have attended beyond reporting their title. I for instance discovered the multilevel Monte Carlo techniques for SDEs, which sounds like a control variate methodology to reduce the variance w/o reducing the discretisation step. (Another instance is that the proceedings will be published in Mathematics and Computers in Simulation or Monte Carlo Methods and Applications. Two journals I have never published in.) Although I have yet a few months before attending my first MCQMC conference, I presume this is somehow a similar spirit and mix of communities.

At another level, attending a conference in Annecy is a blessing: the city is beautiful, the lake pristine and tantalising in the hot weather, and the surrounding mountains (we are actually quite close to Chamonix!) induce me to go running on both mornings and evenings.

## Winter workshop, Gainesville (day 2)

Posted in pictures, Running, Travel, University life, Wines with tags , , , , , , , , , , , , , on January 21, 2013 by xi'an

On day #2, besides my talk on “empirical Bayes” (ABCel) computation (mostly recycled from Varanasi, photos included), Christophe Andrieu gave a talk on exact approximations, using unbiased estimators of the likelihood and characterising estimators garanteeing geometric convergence (bounded weights, essentially, which is a condition popping out again and again in the Monte Carlo literature). Then Art Owen (father of empirical likelihood among other things!) spoke about QMC for MCMC, a topic that always intringued me.

Indeed, while I see the point of using QMC for specific integration problems, I am more uncertain about its relevance for statistics as a simulation device. Having points uniformly distributed over the unit hypercube in a much more efficient way than a random sample is not helping much when only a tiny region of the unit hypercube, namely the one where the likelihood concentrates, matters. (In other words, we are rarely interested in the uniform distribution over the unit hypercube: we instead want to simulate from a highly irregular and definitely concentrated distribution.) I have the same reservation about the applicability of stratified sampling: the strata have to be constructed in relation with the target distribution. The method Art advocates using a CUD (completely uniformly distributed) sequence as the underlying (deterministic) pseudo-unifom sequence. Highly interesting and I want to read the paper in greater details, but the fact that most simulation steps use a random number of uniforms seems detrimental to the performances of the method in general.

After a lunch break at a terrific BBQ place, with a stop at Lake Alice to watch the alligator(s) I had missed during my morning run, I was able this time to attend till the end Xiao-Li Meng’s talk, where he presented new improvements on bridge sampling based on location-scale (or warping) transforms of the original two-samples to make them share mean and variance. Hani Doss concluded the meeting with a talk on the computation of Bayes factors when using (non-parametric) Dirichlet mixture priors, whose resolution does not require simulations for each value of the scale parameter of the Dirichlet prior, thanks to a Radon-Nykodim derivative representation. (Which nicely connected with Art’s talk in that the latter mentioned therein that most simulation methods are actually based on Riemann integration rather than Lebesgue integration. Hani’s representation is not, with nested sampling being another example.)

We ended up the day with a(nother) barbecue outside, under the stars, in the peace and quiet of a local wood, with wine and laughs, just like George would have concluded the workshop. This was a fitting ending to a meeting dedicated to his memory…

## A survey of [the 60’s] Monte Carlo methods

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2011 by xi'an

“The only good Monte Carlos are the dead Monte Carlos” (Trotter and Tukey, quoted by Halton)

When I presented my [partial] history of MCM methods in Bristol two months ago, at the Julian Besag memorial, Christophe Andrieu mentioned a 1970 SIAM survey by John Halton on A retrospective and prospective survey of the Monte Carlo method. This is a huge paper (62 pages, 251 references) and it brings a useful light on the advances in the 60’s (the paper was written in 1968). From the reference list, it seems John Halton was planning two books on the Monte Carlo method, but a search on google did not show anything. I also discovered in this list that there was a 1954 RSS symposium (Read Paper?) on Monte Carlo methods. Note that there were at least two books on Monte Carlo published at the time, Hammersley and Handscomb’s 1964 Monte Carlo Methods and Scheider’s 1966 Monte Carlo Method. (Hammerlsey appears as a major figure in this survey.) There is a lot of material in this long review and most of the standard methods are listed: control variate, importance sampling, self-normalised simportance sampling, stratified sampling, antithetic variables, simulation by inversion, rejection or demarginalisation. Variance reduction is presented as the motivation for the alternative methods. Very little is said about the use of Monte Carlo methods in statistics (“many of  [the applications] are primitive and artless“)  I was first very surprised to find sequential Monte Carlomentioned as well, but it later appeared this was Monte Carlo methods for sequential problems, in the spirit of Abraham Wald. While the now forgotten EZH method is mentioned as a promising new method (p.11), the survey also contains an introduction to the conditional Monte Carlo method of Trotter and Tukey (1956) [from whom the above and rather puzzling quote is taken] that could relate to the averaging techniques of Kong, McCullagh, Meng, Nicolae and Tan as found in their 2003 Read Paper….

“The search for randomness is evidently futile” (Halton)

A large part of the review is taken by the study of uniform random generators and by the distinction between random, pseudo-random and quasi-random versions. Halton insists very much on the lack of justification in using non-random generators, even though they work well. He even goes further as to warn about bias because even the truly random generators are discrete. The book covers the pseudo-random generators, starting with the original version of von Neumann, Metropolis, and Ulam, continuing with Lehmer’s well-known congruencial generator, and the Fibonacci generalisation. For testing those generators by statistical tests (again with little theoretical ground), Marsaglia is mentioned.  The paper also covers in great detail the quasi-random sequences, covering low discrepancy requirements, van der Corput’s, Halton’s and Hammersley’s sequences. Halton considers quasi-Monte Carlo as “a branch of numerical analysis”.

The paper concludes with a list of 24 future developments I will cover in another post tomorrow…