Archive for multi-level Monte Carlo

MCqMC [#3]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on August 20, 2016 by xi'an

On Thursday, Christoph Aistleiter [from TU Gräz] gave a plenary talk at MCqMC 2016 around Hermann Weyl’s 1916 paper, Über die Gleichverteilung von Zahlen mod. Eins, which demonstrates that the sequence a, 22a, 32a, … mod 1 is uniformly distributed on the unit interval when a is irrational. Obviously, the notion was not introduced for simulation purposes, but the construction applies in this setting! At least in a theoretical sense. Since for instance the result that the sequence (a,a²,a³,…) mod 1 being uniformly distributed for almost all a’s has not yet found one realisation a. But a nice hour of history of mathematics and number theory: it is not that common we hear the Riemann zeta function mentioned in a simulation conference!

The following session was a nightmare in that I wanted to attend all four at once! I eventually chose the transport session, in particular because Xiao-Li advertised it at the end of my talk. The connection is that his warp bridge sampling technique provides a folding map between modes of a target. Using a mixture representation of the target and folding all components to a single distribution. Interestingly, this transformation does not require a partition and preserves the normalising constants [which has a side appeal for bridge sampling of course]. In a problem with an unknown number of modes, the technique could be completed by [our] folding in order to bring the unobserved modes into the support of the folded target. Looking forward the incoming paper! The last talk of this session was by Matti Vihola, connecting multi-level Monte Carlo and unbiased estimation à la Rhee and Glynn, paper that I missed when it got arXived last December.

The last session of the day was about probabilistic numerics. I have already discussed extensively about this approach to numerical integration, to the point of being invited to the NIPS workshop as a skeptic! But this was an interesting session, both with introductory aspects and with new ones from my viewpoint, especially Chris Oates’ description of a PN method for handling both integrand and integrating measure as being uncertain. Another arXival that went under my decidedly deficient radar.

retrospective Monte Carlo

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on July 12, 2016 by xi'an

the pond in front of the Zeeman building, University of Warwick, July 01, 2014The past week I spent in Warwick ended up with a workshop on retrospective Monte Carlo, which covered exact sampling, debiasing, Bernoulli factory problems and multi-level Monte Carlo, a definitely exciting package! (Not to mention opportunities to go climbing with some participants.) In particular, several talks focussed on the debiasing technique of Rhee and Glynn (2012) [inspired from von Neumann and Ulam, and already discussed in several posts here]. Including results in functional spaces, as demonstrated by a multifaceted talk by Sergios Agapiou who merged debiasing, deburning, and perfect sampling.

From a general perspective on unbiasing, while there exist sufficient conditions to ensure finite variance and aim at an optimal version, I feel a broader perspective should be adopted towards comparing those estimators with biased versions that take less time to compute. In a diffusion context, Chang-han Rhee presented a detailed argument as to why his debiasing solution achieves a O(√n) convergence rate in opposition the regular discretised diffusion, but multi-level Monte Carlo also achieves this convergence speed. We had a nice discussion about this point at the break, with my slow understanding that continuous time processes had much much stronger reasons for sticking to unbiasedness. At the poster session, I had the nice surprise of reading a poster on the penalty method I discussed the same morning! Used for subsampling when scaling MCMC.

On the second day, Gareth Roberts talked about the Zig-Zag algorithm (which reminded me of the cigarette paper brand). This method has connections with slice sampling but it is a continuous time method which, in dimension one, means running a constant velocity particle that starts at a uniform value between 0 and the maximum density value and proceeds horizontally until it hits the boundary, at which time it moves to another uniform. Roughly. More specifically, this approach uses piecewise deterministic Markov processes, with a radically new approach to simulating complex targets based on continuous time simulation. With computing times that [counter-intuitively] do not increase with the sample size.

Mark Huber gave another exciting talk around the Bernoulli factory problem, connecting with perfect simulation and demonstrating this is not solely a formal Monte Carlo problem! Some earlier posts here have discussed papers on that problem, but I was unaware of the results bounding [from below] the expected number of steps to simulate B(f(p)) from a (p,1-p) coin. If not of the open questions surrounding B(2p). The talk was also great in that it centred on recursion and included a fundamental theorem of perfect sampling! Not that surprising given Mark’s recent book on the topic, but exhilarating nonetheless!!!

The final talk of the second day was given by Peter Glynn, with connections with Chang-han Rhee’s talk the previous day, but with a different twist. In particular, Peter showed out to achieve perfect or exact estimation rather than perfect or exact simulation by a fabulous trick: perfect sampling is better understood through the construction of random functions φ¹, φ², … such that X²=φ¹(X¹), X³=φ²(X²), … Hence,

X^t=\varphi^{t-1}\circ\varphi^{t-2}\circ\ldots\circ\varphi^{1}(X^1)

which helps in constructing coupling strategies. However, since the φ’s are usually iid, the above is generally distributed like

Y^t=\varphi^{1}\circ\varphi^{2}\circ\ldots\circ\varphi^{t-1}(X^1)

which seems pretty similar but offers a much better concentration as t grows. Cutting the function composition is then feasible towards producing unbiased estimators and more efficient. (I realise this is not a particularly clear explanation of the idea, detailed in an arXival I somewhat missed. When seen this way, Y would seem much more expensive to compute [than X].)

coupled filters

Posted in Kids, Statistics, University life with tags , , , , , , , , , on July 11, 2016 by xi'an

couplartPierre Jacob, Fredrik Lindsten, and Thomas Schön recently arXived a paper on coupled particle filters. A coupling problem that proves to be much more complicated than expected, due to the discrete nature of particle filters. The starting point of the paper is the use of common (e.g., uniform) random numbers for the generation of each entry in the particle system at each time t, which maximal correlation gets damaged by the resampling steps (even when using the same uniforms). One suggestion for improving the correlation between entries at each time made in the paper is to resort to optimal transport, using the distance between particles as the criterion. A cheaper alternative is inspired from multi-level Monte Carlo. It builds a joint multinomial distribution by optimising the coupling probability. [Is there any way to iterate this construct instead of considering only the extreme cases of identical values versus independent values?] The authors also recall a “sorted sampling” method proposed by Mike Pitt in 2002, which is to rely on the empirical cdfs derived from the particle systems and on the inverse cdf technique, which is the approach I would have first considered. Possibly with a smooth transform of both ecdf’s in order to optimise the inverse cdf move.  Actually, I have trouble with the notion that the ancestors of a pair of particles should matter. Unless one envisions a correlation of the entire path, but I am ensure how one can make paths correlated (besides coupling). And how this impacts likelihood estimation. As shown in the above excerpt, the coupled approximations produce regular versions and, despite the negative bias, fairly accurate evaluations of likelihood ratios, which is all that matters in an MCMC implementation. The paper also proposes a smoothing algorithm based on Rhee and Glynn (2012) debiasing technique, which operates on expectations against the smoothing distribution (conditional on a value of the parameter θ). Which may connect with the notion of simulating correlated paths. The interesting part is that, due to the coupling, the Rhee and Glynn unbiased estimator has a finite (if random) stopping time.

MCqMC 2014 [day #3]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , on April 10, 2014 by xi'an

Leuven2

As the second day at MCqMC 2014, was mostly on multi-level Monte Carlo and quasi-Monte Carlo methods, I did not attend many talks but had a long run in the countryside (even saw a pheasant and a heron), worked at “home” on pressing recruiting evaluations and had a long working session with Pierre Jacob. Plus an evening out sampling (just) a few Belgian beers in the shade of the city hall…

Today was more in my ballpark as there were MCMC talks the whole day! The plenary talk was not about MCMC as Erich Novak presented a survey on the many available results bounding the complexity of approximating an integral based on a fixed number of evaluations of the integrand, some involving the dimension (and its curse), some not, some as fast as √n and some not as fast, all this depending on the regularity and the size of the classes of integrands considered. In some cases, the solution was importance sampling, in other cases, quasi-Monte Carlo, and yet other cases were still unsolved. Then Yves Atchadé gave a new perspective on computing the asymptotic variance in the central limit theorem on Markov chains when truncating the autocovariance, Matti Vihola talked about theoretical orderings of Markov chains that transmuted into the very practical consequence that using more simulations in a pseudo-marginal likelihood approximation improved acceptance rate and asymptotic variances (and this applies to aBC-MCMC as well), Radu Craiu proposed a novel processing of adaptive MCMC by treating various approximations to the true target as food for a multiple-try Metropolis algorithm, and Luca Martino had a go at resuscitating the ARMS algorithm of Gilks and Wild (used for a while in BUGS), although the talk did not dissipate all of my misgivings about the multidimensional version! I had more difficulties following the “Warwick session” which was made of four talks by current or former students from Warwick, although I appreciated the complexity of the results in infinite dimensional settings and novel approximations to diffusion based Metropolis algorithms. No further session this afternoon as the “social” activity was to visit the nearby Stella Artois brewery! This activity made us very social, for certain, even though there was hardly a soul around in this massively automated factory. (Maybe an ‘Og post to come one of those days…)

MCqMC 2014 [day #1]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on April 9, 2014 by xi'an

Leuven2

As I have been kindly invited to give a talk at MCqMC 2014, here am I. in Leuven, Belgium, for this conference I have never attended before. (I was also invited for MCqMC 2012 in Sydney The talk topics and the attendees’ “sociology” are quite similar to those of the IMACS meeting in Annecy last summer. Namely, rather little on MCMC, particle filters, and other tools familiar in Bayesian computational statistics, but a lot on diffusions and stochastic differential equations and of course quasi-Monte Carlo methods. I thus find myself at a boundary of the conference range and a wee bit lost by some talks, which even titles make little sense to me.

For instance, I have trouble to connect with multi-level Monte Carlo within my own referential. My understanding of the method is one of a control variate version of tempering, namely of using a sequence of approximations to the true target and using rougher approximations as control variates for the finer approximations. But I cannot find on the Web a statistical application of the method outside of diffusions and SDEs, i.e. outside of continuous time processes… Maybe using a particle filter from one approximation to the next, down in terms of roughness, could help.

“Several years ago, Giles (2008) introduced an intriguing multi-level idea to deal with such biased settings that can dramatically improve the rate of convergence and can even, in some settings, achieve the canonical “square root” convergence rate associated with unbiased Monte Carlo.” Rhee and Glynn, 2012

Those were my thoughts before lunchtime. today (namely April 7, 2014). And then, after lunch, Peter Glynn gave his plenary talk that just answered those questions of mine’s!!! Essentially, he showed that formula Pierre Jacob also used in his Bernoulli factory paper to transform a converging-biased-into-an-unbiased estimator, based on a telescopic series representation and a random truncation… This approach is described in a paper with Chang-han Rhee, arXived a few years ago. The talk also covered more recent work (presumably related with Chang-han Rhee’s thesis) extending the above to Markov chains. As explained to me later by Pierre Jacob [of Statisfaction fame!], a regular chain does not converge fast enough to compensate for the explosive behaviour of the correction factor, which is why Rhee and Glynn used instead a backward chain, linking to the exact or perfect samplers of the 1990’s (which origin can be related to a 1992 paper of Asmussen, Glynn and Thorisson). This was certainly the most riveting talk I attended in the past years in that it brought a direct answer to a question I was starting to investigate. And more. I was also wondering how connected it was with our “exact” representation of the stationary distribution (in an Annals of Probability paper with Jim Hobert).   Since we use a stopping rule based on renewal and a geometric waiting time, a somewhat empirical version of the inverse probability found in Peter’s talk. This talk also led me to re-consider a recent discussion we had in my CREST office with Andrew about using square root(ed) importance weights, since one of Peter’s slides exhibited those square roots as optimal. Paradoxically, Peter started the talk by down-playing it, stating there was a single idea therein and a single important slide, making it a perfect after-lunch talk: I wish I had actually had thrice more time to examine each slide! (In the afternoon session, Éric Moulines also gave a thought-provoking talk on particle islands and double bootstrap, a research project I will comment in more detail the day it gets arXived.)