## Quasi-Monte Carlo sampling

Posted in Books, Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on December 10, 2014 by xi'an

“The QMC algorithm forces us to write any simulation as an explicit function of uniform samples.” (p.8)

As posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper.  Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of Jan 2, 2015).

Quasi-Monte Carlo methods are definitely not popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).

“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)

At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators and error rates of order

$\mathfrak{O}(N^{-1}\log(N)^{d-}) \text{ at cost } \mathfrak{O}(N\log(N))$

means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not always used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a deterministic function of a fixed number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step had to appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms.  For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform

$\mathbf{x}_t^n = \Gamma_t(\mathbf{x}_{t-1}^n,\mathbf{u}_{t}^n)$

which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal

$m_t(\tilde{\mathbf{x}}_{t-1}^n,\cdot)$

has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.

The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only Hm is used.

Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?

In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?

What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?

Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?

## Relevant statistics for Bayesian model choice [hot off the press!]

Posted in Books, Statistics, University life with tags , , , , , , on October 30, 2014 by xi'an

Our paper about evaluating statistics used for ABC model choice has just appeared in Series B! It somewhat paradoxical that it comes out just a few days after we submitted our paper on using random forests for Bayesian model choice, thus bypassing the need for selecting those summary statistics by incorporating all statistics available and letting the trees automatically rank those statistics in term of their discriminating power. Nonetheless, this paper remains an exciting piece of work (!) as it addresses the more general and pressing question of the validity of running a Bayesian analysis with only part of the information contained in the data. Quite usefull in my (biased) opinion when considering the emergence of approximate inference already discussed on this ‘Og…

[As a trivial aside, I had first used fresh from the press(es) as the bracketted comment, before I realised the meaning was not necessarily the same in English and in French.]

## Series B reaches 5.721 impact factor!

Posted in Books, Statistics, University life with tags , , , on September 15, 2014 by xi'an

I received this email from Wiley with the great figure that JRSS Series B has now reached a 5.721 impact factor. Which makes it the first journal in Statistics from this perspective. Congrats to editors Gareth Roberts, Piotr Fryzlewicz and Ingrid Van Keilegom for this achievement! An amazing jump from the 2009 figure of 2.84…!

## this issue of Series B

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on September 5, 2014 by xi'an

The September issue of [JRSS] Series B I received a few days ago is of particular interest to me. (And not as an ex-co-editor since I was never involved in any of those papers!) To wit: a paper by Hani Doss and Aixin Tan on evaluating normalising constants based on MCMC output, a preliminary version I had seen at a previous JSM meeting, a paper by Nick Polson, James Scott and Jesse Windle on the Bayesian bridge, connected with Nick’s talk in Boston earlier this month, yet another paper by Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar and Michael Jordan on the bag of little bootstraps, which presentation I heard Michael deliver a few times when he was in Paris. (Obviously, this does not imply any negative judgement on the other papers of this issue!)

For instance, Doss and Tan consider the multiple mixture estimator [my wording, the authors do not give the method a name, referring to Vardi (1985) but missing the connection with Owen and Zhou (2000)] of k ratios of normalising constants, namely

$\sum_{l=1}^k \frac{1}{n_l} \sum_{t=1}^{n_l} \dfrac{n_l g_j(x_t^l)}{\sum_{s=1}^k n_s g_s(x_t^l) z_1/z_s } \longrightarrow \dfrac{z_j}{z_1}$

where the z’s are the normalising constants and with possible different numbers of iterations of each Markov chain. An interesting starting point (that Hans Künsch had mentioned to me a while ago but that I had since then forgotten) is that the problem was reformulated by Charlie Geyer (1994) as a quasi-likelihood estimation where the ratios of all z’s relative to one reference density are the unknowns. This is doubling interesting, actually, because it restates the constant estimation problem into a statistical light and thus somewhat relates to the infamous “paradox” raised by Larry Wasserman a while ago. The novelty in the paper is (a) to derive an optimal estimator of the ratios of normalising constants in the Markov case, essentially accounting for possibly different lengths of the Markov chains, and (b) to estimate the variance matrix of the ratio estimate by regeneration arguments. A favourite tool of mine, at least theoretically as practically useful minorising conditions are hard to come by, if at all available.

## Statistics and Computing special MCMSk’issue [call for papers]

Posted in Books, Mountains, R, Statistics, University life with tags , , , , , , , , , , , on February 7, 2014 by xi'an

Following the exciting and innovative talks, posters and discussions at MCMski IV, the editor of Statistics and Computing, Mark Girolami (who also happens to be the new president-elect of the BayesComp section of ISBA, which is taking over the management of future MCMski meetings), kindly proposed to publish a special issue of the journal open to all participants to the meeting. Not only to speakers, mind, but to all participants.

So if you are interested in submitting a paper to this special issue of a computational statistics journal that is very close to our MCMski themes, I encourage you to do so. (Especially if you missed the COLT 2014 deadline!) The deadline for submissions is set on March 15 (a wee bit tight but we would dearly like to publish the issue in 2014, namely the same year as the meeting.) Submissions are to be made through the Statistics and Computing portal, with a mention that they are intended for the special issue.

An editorial committee chaired by Antonietta Mira and composed of Christophe Andrieu, Brad Carlin, Nicolas Chopin, Jukka Corander, Colin Fox, Nial Friel, Chris Holmes, Gareth Jones, Peter Müller, Antonietta Mira, Geoff Nicholls, Gareth Roberts, Håvård Rue, Robin Ryder, and myself, will examine the submissions and get back within a few weeks to the authors. In a spirit similar to the JRSS Read Paper procedure, submissions will first be examined collectively, before being sent to referees. We plan to publish the reviews as well, in order to include a global set of comments on the accepted papers. We intend to do it in The Economist style, i.e. as a set of edited anonymous comments. Usual instructions for Statistics and Computing apply, with the additional requirements that the paper should be around 10 pages and include at least one author who took part in MCMski IV.

## Series B news

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 24, 2014 by xi'an

The Journal of the Royal Statistical Society, Series B, has a new cover, a new colour and a new co-editor. As can be seen from the above shots, the colour is now a greenish ochre, with a picture of pedestrians on a brick plaza as a background, not much related to statistical methodology as far as I can tell. More importantly, the new co-editor for the coming four years is Piotr Fryzlewicz, professor at the London School of Economics, who will share the burden with Ingrid van Keilegom professor from UCL (Louvain-la-Neuve) who is now starting her third year… My friend, colleague and successor as Series B editor Gareth Roberts is now retiring after four years of hard work towards making Series B one of the top journals in Statistics. Thanks Gareth and best wishes to Ingrid and Piotr!

## relevant statistics for Bayesian model choice (#4)

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on August 23, 2013 by xi'an

I have just posted on arXiv the fourth (and hopefully final) version of our paper, Relevant statistics for Bayesian model choice, written jointly with Jean-Michel Marin, Natesh Pillai, and Judith Rousseau over the past two years. As we received a very positive return from the editorial team at JRSS Series B, I flew to Montpellier today to write & resubmit a revised version of the paper. The changes are only stylistic, since we could not answer in depth a query about the apparently different speeds of convergence of the posterior probabilities under the Gaussian and Laplace distributions in Figures 3 & 4 (see paper). This was a most interesting question in that the marginal likelihoods do indeed seem to converge at different speeds. However, the only precise information we can derive from our result (Theorem 1) is when the Bayes factor is not consistent. Otherwise, we only have a lower bound on its speed of convergence (under the correct model). Getting precise speeds in this case sounds beyond our reach… (Unless I am confused with time zones, this post should come alive just after the fourth version is announced on arXiv..)