## NIPS 2014

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on December 15, 2014 by xi'an

Second and last day of the NIPS workshops! The collection of topics was quite broad and would have made my choosing an ordeal, except that I was invited to give a talk at the probabilistic programming workshop, solving my dilemma… The first talk by Kathleen Fisher was quite enjoyable in that it gave a conceptual discussion of the motivations for probabilistic languages, drawing an analogy with the early days of computer programming that saw a separation between higher level computer languages and machine programming, with a compiler interface. And calling for a similar separation between the models faced by statistical inference and machine-learning and the corresponding code, if I understood her correctly. This was connected with Frank Wood’s talk of the previous day where he illustrated the concept through a generation of computer codes to approximately generate from standard distributions like Normal or Poisson. Approximately as in ABC, which is why the organisers invited me to talk in this session. However, I was a wee bit lost in the following talks and presumably lost part of my audience during my talk, as I realised later to my dismay when someone told me he had not perceived the distinction between the trees in the random forest procedure and the phylogenetic trees in the population genetic application. Still, while it had for me a sort of Twilight Zone feeling of having stepped in another dimension, attending this workshop was an worthwhile experiment as an eye-opener into a highly different albeit connected field, where code and simulator may take the place of a likelihood function… To the point of defining Hamiltonian Monte Carlo directly on the former, as Vikash Mansinghka showed me at the break.

I completed the day with the final talks in the variational inference workshop, if only to get back on firmer ground! Apart from attending my third talk by Vikash in the conference (but on a completely different topic on variational approximations for discrete particle-ar distributions), a talk by Tim Salimans linked MCMC and variational approximations, using MCMC and HMC to derive variational bounds. (He did not expand on the opposite use of variational approximations to build better proposals.) Overall, I found these two days and my first NIPS conference quite exciting, if somewhat overpowering, with a different atmosphere and a different pace compared with (small or large) statistical meetings. (And a staggering gender imbalance!)

## ABC à Montréal

Posted in Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on December 13, 2014 by xi'an

So today was the NIPS 2014 workshop, “ABC in Montréal“, which started with a fantastic talk by Juliane Liepe on some exciting applications of ABC to the migration of immune cells, with the analysis of movies involving those cells acting to heal a damaged fly wing and a cut fish tail. Quite amazing videos, really. (With the great entry line of ‘We have all cut  a finger at some point in our lives’!) The statistical model behind those movies was a random walk on a grid, with different drift and bias features that served as model characteristics. Frank Wood managed to deliver his talk despite a severe case of food poisoning, with a great illustration of probabilistic programming that made me understand (at last!) the very idea of probabilistic programming. And  Vikash Mansinghka presented some applications in image analysis. Those two talks led me to realise why probabilistic programming was so close to ABC, with a programming touch! Hence why I was invited to talk today! Then Dennis Prangle exposed his latest version of lazy ABC, that I have already commented on the ‘Og, somewhat connected with our delayed acceptance algorithm, to the point that maybe something common can stem out of the two notions. Michael Blum ended the day with provocative answers to the provocative question of Ted Meeds as to whether or not machine learning needed ABC (Ans. No!) and whether or not machine learning could help ABC (Ans. ???). With an happily mix-up between mechanistic and phenomenological models that helped generating discussion from the floor.

The posters were also of much interest, with calibration as a distance measure by Michael Guttman, in continuation of the poster he gave at MCMski, Aaron Smith presenting his work with Luke Bornn, Natesh Pillai and Dawn Woodard, on why a single pseudo-sample is enough for ABC efficiency. This gave me the opportunity to discuss with him the apparent contradiction with the result of Kryz Łatunsziński and Anthony Lee about the geometric convergence of ABC-MCMC only attained with a random number of pseudo-samples… And to wonder if there is a geometric versus binomial dilemma in this setting, Namely, whether or not simulating pseudo-samples until one is accepted would be more efficient than just running one and discarding it in case it is too far. So, although the audience was not that large (when compared with the other “ABC in…” and when considering the 2500+ attendees at NIPS over the week!), it was a great day where I learned a lot, did not have a doze during talks (!), [and even had an epiphany of sorts at the treadmill when I realised I just had to take longer steps to reach 16km/h without hyperventilating!] So thanks to my fellow organisers, Neil D Lawrence, Ted Meeds, Max Welling, and Richard Wilkinson for setting the program of that day! And, by the way, where’s the next “ABC in…”?! (Finland, maybe?)

## Quasi-Monte Carlo sampling

Posted in Books, Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on December 10, 2014 by xi'an

“The QMC algorithm forces us to write any simulation as an explicit function of uniform samples.” (p.8)

As posted a few days ago, Mathieu Gerber and Nicolas Chopin will read this afternoon a Paper to the Royal Statistical Society on their sequential quasi-Monte Carlo sampling paper.  Here are some comments on the paper that are preliminaries to my written discussion (to be sent before the slightly awkward deadline of Jan 2, 2015).

Quasi-Monte Carlo methods are definitely not popular within the (mainstream) statistical community, despite regular attempts by respected researchers like Art Owen and Pierre L’Écuyer to induce more use of those methods. It is thus to be hoped that the current attempt will be more successful, it being Read to the Royal Statistical Society being a major step towards a wide diffusion. I am looking forward to the collection of discussions that will result from the incoming afternoon (and bemoan once again having to miss it!).

“It is also the resampling step that makes the introduction of QMC into SMC sampling non-trivial.” (p.3)

At a mathematical level, the fact that randomised low discrepancy sequences produce both unbiased estimators and error rates of order

$\mathfrak{O}(N^{-1}\log(N)^{d-}) \text{ at cost } \mathfrak{O}(N\log(N))$

means that randomised quasi-Monte Carlo methods should always be used, instead of regular Monte Carlo methods! So why is it not always used?! The difficulty stands [I think] in expressing the Monte Carlo estimators in terms of a deterministic function of a fixed number of uniforms (and possibly of past simulated values). At least this is why I never attempted at crossing the Rubicon into the quasi-Monte Carlo realm… And maybe also why the step had to appear in connection with particle filters, which can be seen as dynamic importance sampling methods and hence enjoy a local iid-ness that relates better to quasi-Monte Carlo integrators than single-chain MCMC algorithms.  For instance, each resampling step in a particle filter consists in a repeated multinomial generation, hence should have been turned into quasi-Monte Carlo ages ago. (However, rather than the basic solution drafted in Table 2, lower variance solutions like systematic and residual sampling have been proposed in the particle literature and I wonder if any of these is a special form of quasi-Monte Carlo.) In the present setting, the authors move further and apply quasi-Monte Carlo to the particles themselves. However, they still assume the deterministic transform

$\mathbf{x}_t^n = \Gamma_t(\mathbf{x}_{t-1}^n,\mathbf{u}_{t}^n)$

which the q-block on which I stumbled each time I contemplated quasi-Monte Carlo… So the fundamental difficulty with the whole proposal is that the generation from the Markov proposal

$m_t(\tilde{\mathbf{x}}_{t-1}^n,\cdot)$

has to be of the above form. Is the strength of this assumption discussed anywhere in the paper? All baseline distributions there are normal. And in the case it does not easily apply, what would the gain bw in only using the second step (i.e., quasi-Monte Carlo-ing the multinomial simulation from the empirical cdf)? In a sequential setting with unknown parameters θ, the transform is modified each time θ is modified and I wonder at the impact on computing cost if the inverse cdf is not available analytically. And I presume simulating the θ’s cannot benefit from quasi-Monte Carlo improvements.

The paper obviously cannot get into every detail, obviously, but I would also welcome indications on the cost of deriving the Hilbert curve, in particular in connection with the dimension d as it has to separate all of the N particles, and on the stopping rule on m that means only Hm is used.

Another question stands with the multiplicity of low discrepancy sequences and their impact on the overall convergence. If Art Owen’s (1997) nested scrambling leads to the best rate, as implied by Theorem 7, why should we ever consider another choice?

In connection with Lemma 1 and the sequential quasi-Monte Carlo approximation of the evidence, I wonder at any possible Rao-Blackwellisation using all proposed moves rather than only those accepted. I mean, from a quasi-Monte Carlo viewpoint, is Rao-Blackwellisation easier and is it of any significant interest?

What are the computing costs and gains for forward and backward sampling? They are not discussed there. I also fail to understand the trick at the end of 4.2.1, using SQMC on a single vector instead of (t+1) of them. Again assuming inverse cdfs are available? Any connection with the Polson et al.’s particle learning literature?

Last questions: what is the (learning) effort for lazy me to move to SQMC? Any hope of stepping outside particle filtering?

## amazonish thanks (& repeated warning)

Posted in Books, Kids, R, Statistics with tags , , , , , on December 9, 2014 by xi'an

As in previous years, at about this time, I want to (re)warn unaware ‘Og readers that all links to Amazon.com and more rarely to Amazon.fr found on this blog are actually susceptible to earn me an advertising percentage if a purchase is made by the reader in the 24 hours following the entry on Amazon through this link, thanks to the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com/fr. Unlike last year, I did not benefit as much from the new edition of Andrew’s book, and the link he copied from my blog entry… Here are some of the most Og-unrelated purchases:

Once again, books I reviewed, positively or negatively, were among the top purchases… Like a dozen Monte Carlo simulation and resampling methods for social science , a few copies of Naked Statistics. And again a few of The Cartoon Introduction to Statistics. (Despite a most critical review.) Thanks to all of you using those links and feeding further my book addiction, with the drawback of inducing even more fantasy book reviews.

## the demise of the Bayes factor

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , on December 8, 2014 by xi'an

With Kaniav Kamary, Kerrie Mengersen, and Judith Rousseau, we have just arXived (and submitted) a paper entitled “Testing hypotheses via a mixture model”. (We actually presented some earlier version of this work in Cancũn, Vienna, and Gainesville, so you may have heard of it already.) The notion we advocate in this paper is to replace the posterior probability of a model or an hypothesis with the posterior distribution of the weights of a mixture of the models under comparison. That is, given two models under comparison,

$\mathfrak{M}_1:x\sim f_1(x|\theta_1) \text{ versus } \mathfrak{M}_2:x\sim f_2(x|\theta_2)$

we propose to estimate the (artificial) mixture model

$\mathfrak{M}_{\alpha}:x\sim\alpha f_1(x|\theta_1) + (1-\alpha) f_2(x|\theta_2)$

and in particular derive the posterior distribution of α. One may object that the mixture model is neither of the two models under comparison but this is the case at the boundary, i.e., when α=0,1. Thus, if we use prior distributions on α that favour the neighbourhoods of 0 and 1, we should be able to see the posterior concentrate near 0 or 1, depending on which model is true. And indeed this is the case: for any given Beta prior on α, we observe a higher and higher concentration at the right boundary as the sample size increases. And establish a convergence result to this effect. Furthermore, the mixture approach offers numerous advantages, among which [verbatim from the paper]:

## Statistics slides (5)

Posted in Books, Kids, Statistics, University life with tags , , , , , on December 7, 2014 by xi'an

Here is the fifth and last set of slides for my third year statistics course, trying to introduce Bayesian statistics in the most natural way and hence starting with… Rasmus’ socks and ABC!!! This is an interesting experiment as I have no idea how my students will react. Either they will see the point besides the anecdotal story or they’ll miss it (being quite unhappy so far about the lack of mathematical rigour in my course and exercises…). We only have two weeks left so I am afraid the concept will not have time to seep through!