Quasi-Monte Carlo sampling

Posted in Books, Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on December 10, 2014 by xi'an

“The QMC algorithm forces us to write any simulation as an explicit function of uniform samples.” (p.8)

As posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper.  Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of Jan 2, 2015).

Quasi-Monte Carlo methods are definitely not popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).

“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)

At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators and error rates of order

$\mathfrak{O}(N^{-1}\log(N)^{d-}) \text{ at cost } \mathfrak{O}(N\log(N))$

means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not always used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a deterministic function of a fixed number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step had to appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms.  For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform

$\mathbf{x}_t^n = \Gamma_t(\mathbf{x}_{t-1}^n,\mathbf{u}_{t}^n)$

which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal

$m_t(\tilde{\mathbf{x}}_{t-1}^n,\cdot)$

has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.

The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only Hm is used.

Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?

In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?

What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?

Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?

6th French Econometrics Conference in Dauphine

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on October 15, 2014 by xi'an

On December 4-5, Université Paris-Dauphine will host the 6th French Econometric Conference, which celebrates Christian Gouriéroux and his contributions to econometrics. (Christian was my statistics professor during my graduate years at ENSAE and then Head of CREST when I joined this research unit, first as a PhD student and later as Head of the statistics group. And he has always been a tremendous support for me.)

Not only is the program quite impressive, with co-authors of Christian Gouriéroux and a few Nobel laureates (if not the latest, Jean Tirole, who taught economics at ENSAE when I was a student there), but registration is free. I will most definitely attend the talks, as I am in Paris-Dauphine at this time of year (the week before NIPS). In particular, looking forward to Gallant’s views on Bayesian statistics.

art brut

Posted in pictures, Running, Travel with tags , , , , , , on July 26, 2014 by xi'an

faculty positions in statistics at ENSAE, Paris

Posted in Statistics, University life with tags , , , , , , on April 28, 2014 by xi'an

Here is a call from ENSAE about two positions in statistics/machine learning, starting next semester:

ENSAE ParisTech and CREST is currently inviting applications for one position at the level associate or full professor from outstanding candidates having demonstrated abilities in both research and teaching. We are interested in candidates with a Ph.D. in Statistics or Machine Learning (or related field) whose research interests are in high dimensional statistical inference, learning theory or statistics of networks.

The appointment could begin as soon as September 1, 2014. The position is for an initial three-year term, with a possible renewal option in case of positive evaluation of research and teaching activities. Salary for suitably qualified applicants is competitive and commensurate with experience. The deadline for application is May 19, 2014.  Full details are given here for the first position and there for the second position.

Dan Simpson’s seminar at CREST

Posted in Kids, Mountains, Statistics, Travel, University life with tags , , , , , , , , , on April 18, 2014 by xi'an

Daniel Simpson gave a seminar at CREST yesterday on his recently arXived paper, “Penalising model component complexity: A principled, practical  approach to constructing priors” written with Thiago Martins, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. Paper that he should also have given in Banff last month had he not lost his passport in København airport…  I have already commented at length on this exciting paper, hopefully to become a discussion paper in a top journal!, so I am just pointing out two things that came to my mind during the energetic talk delivered by Dan to our group. The first thing is that those penalised complexity (PC) priors of theirs rely on some choices in the ordering of the relevance, complexity, nuisance level, &tc. of the parameters, just like reference priors. While Dan already wrote a paper on Russian roulette, there is also a Russian doll principle at work behind (or within) PC priors. Each shell of the Russian doll corresponds to a further level of complexity whose order need be decided by the modeller… Not very realistic in a hierarchical model with several types of parameters having only local meaning.

My second point is that the construction of those “politically correct” (PC) priors reflects another Russian doll structure, namely one of embedded models, hence would and should lead to a natural multiple testing methodology. Except that Dan rejected this notion during his talk, by being opposed to testing per se. (A good topic for one of my summer projects, if nothing more, then!)

data scientist position

Posted in R, Statistics, University life with tags , , , , , , , , , , on April 8, 2014 by xi'an

Our newly created Chaire “Economie et gestion des nouvelles données” in Paris-Dauphine, ENS Ulm, École Polytechnique and ENSAE is recruiting a data scientist starting as early as May 1, the call remaining open till the position is filled. The location is in one of the above labs in Paris, the duration for at least one year, salary is varying, based on the applicant’s profile, and the contacts are Stephane Gaiffas (stephane.gaiffas AT cmap DOT polytechnique.fr), Robin Ryder (ryder AT ceremade DOT dauphine.fr). and Gabriel Peyré (peyre AT ceremade DOT dauphine.fr). Here are more details:

Job description

The chaire “Economie et gestion des nouvelles données” is recruiting a talented young engineer specialized in large scale computing and data processing. The targeted applications include machine learning, imaging sciences and finance. This is a unique opportunity to join a newly created research group between the best Parisian labs in applied mathematics and computer science (ParisDauphine, ENS Ulm, Ecole Polytechnique and ENSAE) working hand in hand with major industrial companies (Havas, BNP Paribas, Warner Bros.). The proposed position consists in helping researchers of the group to develop and implement large scale data processing methods, and applying these methods on real life problems in collaboration with the industrial partners.

A non exhaustive list of methods that are currently investigated by researchers of the group, and that will play a key role in the computational framework developed by the recruited engineer, includes :
● Large scale non smooth optimization methods (proximal schemes, interior points, optimization on manifolds).
● Machine learning problems (kernelized methods, Lasso, collaborative filtering, deep learning, learning for graphs, learning for timedependent systems), with a particular focus on large scale problems and stochastic methods.
● Imaging problems (compressed sensing, superresolution).
● Approximate Bayesian Computation (ABC) methods.
● Particle and Sequential Monte Carlo methods

Candidate profile

The candidate should have a very good background in computer science with various programming environments (e.g. Matlab, Python, C++) and knowledge of high performance computing methods (e.g. GPU, parallelization, cloud computing). He/she should adhere to the open source philosophy and possibly be able to interact with the relevant communities (e.g. scikitlearn initiative). Typical curriculum includes engineering school or Master studies in computer science / applied maths / physics, and possibly a PhD (not required).

Working environment

The recruited engineer will work within one of the labs of the chaire. He/she will benefit from a very stimulating working environment and all required computing resources. He/she will work in close interaction with the 4 research labs of the chaire, and will also have regular meetings with the industrial partners. More information about the chaire can be found online at http://www.di.ens.fr/~aspremon/chaire/

cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

At MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

$\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}$

instead of

$\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)$

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we do not simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):