## Quasi-Monte Carlo sampling

Posted in Books, Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on December 10, 2014 by xi'an

“The QMC algorithm forces us to write any simulation as an explicit function of uniform samples.” (p.8)

As posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper.  Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of Jan 2, 2015).

Quasi-Monte Carlo methods are definitely not popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).

“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)

At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators and error rates of order

$\mathfrak{O}(N^{-1}\log(N)^{d-}) \text{ at cost } \mathfrak{O}(N\log(N))$

means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not always used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a deterministic function of a fixed number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step had to appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms.  For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform

$\mathbf{x}_t^n = \Gamma_t(\mathbf{x}_{t-1}^n,\mathbf{u}_{t}^n)$

which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal

$m_t(\tilde{\mathbf{x}}_{t-1}^n,\cdot)$

has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.

The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only Hm is used.

Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?

In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?

What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?

Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?

## Methodological developments in evolutionary genomic [3 years postdoc in Montpellier]

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , on November 26, 2014 by xi'an

[Here is a call for a post-doctoral position in Montpellier, South of France, not Montpelier, Vermont!, in a population genetics group with whom I am working. Highly recommended if you are currently looking for a postdoc!]

#### Three-year post-doctoral position at the Institute of Computational Biology (IBC), Montpellier (France) : Methodological developments in evolutionary genomics.

One young investigator position opens immediately at the Institute for Computational Biology (IBC) of Montpellier (France) to work on the development of innovative inference methods and software in population genomics or phylogenetics to analyze large-scale genomic data in the fields of health, agronomy and environment (Work Package 2 « evolutionary genomics » of the IBC). The candidate will develop its own research on some of the following topics : selective processes, demographic history, spatial genetic processes, very large phylogenies reconstruction, gene/species tree reconciliation, using maximum likelihood, Bayesian and simulation-based inference. We are seeking a candidate with a strong background in mathematical and computational evolutionary biology, with interest in applications and software development. The successfull candidate will work on his own project, build in collaboration with any researcher involved in the WP2 project and working at the IBC labs (AGAP, CBGP, ISEM, I3M, LIRMM, MIVEGEC).

IBC hires young investigators, typically with a PhD plus some post-doc experience, a high level of publishing, strong communication abilities, and a taste for multidisciplinary research. Working full-time at IBC, these young researchers will play a key role in Institute life. Most of their time will be devoted to scientific projects. In addition, they are expected to actively participate in the coordination of workpackages, in the hosting of foreign researchers and in the organization of seminars and events (summer schools, conferences…). In exchange, these young researchers will benefit from an exceptional environment thanks to the presence of numerous leading international researchers, not to mention significant autonomy for their work. Montpellier hosts one of the most vibrant communities of biodiversity research in Europe with several research centers of excellence in the field. This positions is open for up to 3 years with a salary well above the French post-doc standards. Starting date is open to discussion.

The application deadline is January 31, 2015.

Living in Montpellier: http://www.agropolis.org/english/guide/index.html

#### Contacts at WP2 « Evolutionary Genetics » :

Jean-Michel Marin : http://www.math.univ-montp2.fr/~marin/

Olivier Gascuel : http://www.lirmm.fr/~gascuel/

Submit my application : http://www.ibc-montpellier.fr/open-positions/young-investigators#wp2-evolution

## Challis Lectures

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , on November 23, 2014 by xi'an

I had a great time during this short visit in the Department of Statistics, University of Florida, Gainesville. First, it was a major honour to be the 2014 recipient of the George H. Challis Award and I considerably enjoyed delivering my lectures on mixtures and on ABC with random forests, And chatting with members of the audience about the contents afterwards. Here is the physical award I brought back to my office:

More as a piece of trivia, here is the amount of information about the George H. Challis Award I found on the UF website:

This fund was established in 2000 by Jack M. and Linda Challis Gill and the Gill Foundation of Texas, in memory of Linda’s father, to support faculty and student conference travel awards and the George Challis Biostatistics Lecture Series. George H. Challis was born on December 8, 1911 and was raised in Italy and Indiana. He was the first cousin of Indiana composer Cole Porter. George earned a degree in 1933 from the School of Business at Indiana University in Bloomington. George passed away on May 6, 2000. His wife, Madeline, passed away on December 14, 2009.

Cole Porter, indeed!

On top of this lecturing activity, I had a full academic agenda, discussing with most faculty members and PhD students of the Department, on our respective research themes over the two days I was there and it felt like there was not enough time! And then, during the few remaining hours where I did not try to stay on French time (!), I had a great time with my friends Jim and Maria in Gainesville, tasting a fantastic local IPA beer from Cigar City Brewery and several great (non-local) red wines… Adding to that a pile of new books, a smooth trip both ways, and a chance encounter with Alicia in Atlanta airport, it was a brilliant extended weekend!

## limbo IPA

Posted in pictures, Travel, Wines with tags , , , , , , , on November 20, 2014 by xi'an

## back in Gainesville (FL)

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , on November 12, 2014 by xi'an

Today, I am flying to Gainesville, Florida, for the rest of the week, to give a couple of lectures. More precisely, I have actually been nominated the 2014 Challis lecturer by the Department of Statistics there, following an impressive series of top statisticians (most of them close friends, is there a correlation there?!). I am quite excited to meet again with old friends and to be back at George’s University, if only for a little less than three days. (There is a certain trend in those Fall trips as I have been going for a few days and two talks to the USA or Canada for the past three Falls: to Ames and Chicago in 2012, to Pittsburgh (CMU) and Toronto in 2013…)

## Domaine Ollier Taillefer [Faugères]

Posted in Travel, Wines with tags , , , , , , on October 20, 2014 by xi'an