## capture mark recapture with no mark and no recapture [aka 23andmyfish]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , on June 11, 2015 by xi'an

A very exciting talk today at NBBC15 here in Reykjavik was delivered by Mark Bravington yesterday on Close-kin mark recapture by modern magic (!). Although Mark is from Australia, being a Hobart resident does qualify him for the Nordic branch of the conference! The exciting idea is to use genetic markers to link catches in a (fish) population as being related as parent-offspring or as siblings. This sounds like science-fantasy when you first hear of it!, but it is actually working better than standard capture-mark-recapture methods for populations of a certain size (so that the chances to find related animals are not the absolute zero!, as, e.g., krill populations). The talk was focussed on bluefin tuna, whose survival is unlikely under the current fishing pressure… Among the advantages, a much more limited impact of the capture on the animal, since only a small amount of genetic material is needed, no tag loss, tag destruction by hunters, or tag impact of the animal survival, no recapture, a unique identification of each animal, and the potential for a detailed amount of information through the genetic record. Ideally, the entire sample could lead to a reconstruction of its genealogy all the way to the common ancestor, a wee bit like what 23andme proposes for humans, but this remains at the science-fantasy level given what is currently know about the fish species genomes.

## scalable Bayesian inference for the inverse temperature of a hidden Potts model

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on April 7, 2015 by xi'an

Matt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β].  I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…

## Dom Juan’s opening

Posted in Books, Kids with tags , , , , , , , , , on March 22, 2015 by xi'an

The opening lines of the Dom Juan plan by Molière, a play with highly subversive undertones about free will and religion. And this ode to tobacco that may get it banned in Australia, if the recent deprogramming of Bizet’s Carmen is setting a trend! [Personal note to Andrew: neither Molière’s not my research are or were supported by a tobacco company! Although I am not 100% sure about Molière…]

“Quoi que puisse dire Aristote et toute la philosophie, il n’est rien d’égal au tabac: c’est la passion des honnêtes gens, et qui vit sans tabac n’est pas digne de vivre. Non seulement il réjouit et purge les cerveaux humains, mais encore il instruit les âmes à la vertu, et l’on apprend avec lui à devenir honnête homme.”

Dom Juan, Molière, 1665

[Whatever may be argued by Aristotle and the entire philosophy, there is nothing equal to tobacco; it is the passion of upright people, and whoever lives without tobacco does not deserve living. Not only it rejoices and purges human brains, but it also brings souls towards virtue, and teaches about becoming a gentleman.]

## independent component analysis and p-values

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 8, 2014 by xi'an

Last morning at the neuroscience workshop Jean-François Cardoso presented independent component analysis though a highly pedagogical and enjoyable tutorial that stressed the geometric meaning of the approach, summarised by the notion that the (ICA) decomposition

$X=AS$

of the data X seeks both independence between the columns of S and non-Gaussianity. That is, getting as away from Gaussianity as possible. The geometric bits came from looking at the Kullback-Leibler decomposition of the log likelihood

$-\mathbb{E}[\log L(\theta|X)] = KL(P,Q_\theta) + \mathfrak{E}(P)$

where the expectation is computed under the true distribution P of the data X. And Qθ is the hypothesised distribution. A fine property of this decomposition is a statistical version of Pythagoreas’ theorem, namely that when the family of Qθ‘s is an exponential family, the Kullback-Leibler distance decomposes into

$KL(P,Q_\theta) = KL(P,Q_{\theta^0}) + KL(Q_{\theta^0},Q_\theta)$

where θ⁰ is the expected maximum likelihood estimator of θ. (We also noticed this possibility of a decomposition in our Kullback-projection variable-selection paper with Jérôme Dupuis.) The talk by Aapo Hyvärinen this morning was related to Jean-François’ in that it used ICA all the way to a three-level representation if oriented towards natural vision modelling in connection with his book and the paper on unormalised models recently discussed on the ‘Og.

On the afternoon, Eric-Jan Wagenmaker [who persistently and rationally fight the (ab)use of p-values and who frequently figures on Andrew’s blog] gave a warning tutorial talk about the dangers of trusting p-values and going fishing for significance in existing studies, much in the spirit of Andrew’s blog (except for the defence of Bayes factors). Arguing in favour of preregistration. The talk was full of illustrations from psychology. And included the line that ESP testing is the jester of academia, meaning that testing for whatever form of ESP should be encouraged as a way to check testing procedures. If a procedure finds a significant departure from the null in this setting, there is something wrong with it! I was then reminded that Eric-Jan was one of the authors having analysed Bem’s controversial (!) paper on the “anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms”… (And of the shocking talk by Jessica Utts on the same topic I attended in Australia two years ago.)

## ABC in Sydney [guest post #2]

Posted in pictures, Statistics, University life with tags , , , on July 24, 2014 by xi'an

[Here is a second guest post on the ABC in Sydney workshop, written by Chris Drovandi]

First up Dennis Prangle presented his recent work on “Lazy ABC”, which can speed up ABC by potentially abandoning model simulations early that do not look promising. Dennis introduces a continuation probability to ensure that the target distribution of the approach is still the ABC target of interest. In effect, the ABC likelihood is estimated to be 0 if early stopping is performed otherwise the usual ABC likelihood is inflated by dividing by the continuation probability, ensuring an unbiased estimator of the ABC likelihood. The drawback is that the ESS (Dennis uses importance sampling) of the lazy approach will likely be less than usual ABC for a fixed number of simulations; but this should be offset by the reduction in time required to perform said simulations. Dennis also presented some theoretical work for optimally tuning the method, which I need more time to digest.
This was followed by my talk on Bayesian indirect inference methods that use a parametric auxiliary model (a slightly older version here). This paper has just been accepted by Statistical Science.
Morning tea was followed by my PhD student, Brenda Vo, who presented an interesting application of ABC to cell spreading experiments. Here an estimate of the diameter of the cell population was used as a summary statistic. It was noted after Brenda’s talk that this application might be a good candidate for Dennis’ Lazy ABC idea. This talk was followed by a much more theoretical presentation by Pierre del Moral on how particle filter methodologies can be adapted to the ABC setting and also a general framework for particle methods.
Following lunch, Guilherme Rodrigues presented a hierarchical Gaussian Process model for kernel density estimation in the presence of different subgroups. Unfortunately my (lack of) knowledge on non-parametric methods prevents me from making any further comment except that the model looked very interesting and ABC seemed a good candidate for calibrating the model. I look forward to the paper appearing on-line.
The next presentation was by Gael Martin who spoke about her research on using ABC for estimation of complex state space models. This was probably my favourite talk of the day, and not only because it is very close to my research interests. Here the score of the Euler discretised approximation of the generative model was used as summary statistics for ABC. From what I could gather, it was demonstrated that the ABC posterior based on the score or the MLE of the auxiliary model were the same in the limit as ε 0 (unless I have mis-interpreted). This is a very useful result in itself; using the score to avoid an optimisation required for the MLE can save a lot of computation. The improved approximations of the proposed approach compared with the results that use the likelihood of the Euler discretisation were quite promising. I am certainly looking forward to this paper coming out.
Matt Moores drew the short straw and had the final presentation on the Friday afternoon. Matt spoke about this paper (an older version is available here), of which I am now a co-author. Matt’s idea is that doing some pre-simulations across the prior space and determining a mapping between the parameter of interest and the mean and variance of the summary statistic can significantly speed up ABC for the Potts model, and potentially other ABC applications. The results of the pre-computation step are used in the main ABC algorithm, which no longer requires simulation of pseudo-data but rather a summary statistic can be simulated from the fitted auxiliary model in the pre-processing step. Whilst this approach does introduce a couple more layers of approximation, the gain in computation time was up to two orders of magnitude. The talks by Matt, Gael and myself gave a real indirect inference flavour to this year’s ABC in…

## ABC in Sydney [guest post]

Posted in pictures, Statistics, University life with tags , , , on July 18, 2014 by xi'an

[Scott Sisson sent me this summary of the ABC in Sydney meeting that took place two weeks ago.]

Following on from ABC in Paris (2009), ABC in London (2011) and ABC in Rome (2013), the fourth instalment of the international workshops in Approximate Bayesian Computation (ABC) was held at UNSW in Sydney on 3rd-4th July 2014. The first antipodean workshop was held as a satellite to the huge (>550 registrations) IMS-ASC-2014 International Conference, also held in Sydney the following week.

ABC in Sydney was created in two parts. The first, on the Thursday, was held as an “introduction to ABC” for people who were interested to find out more about the subject, but who had not particularly been exposed to the area before. Rather than have a single brave individual give the introductory course over several hours, the expository presentation was “crowdsourced” from several experienced researchers in the field, with each being given 30 minutes to present on a particular aspect of ABC. In this way, Matthew Moores (QUT), Dennis Prangle (Reading), Chris Drovandi (QUT), Zach Aandahl (UNSW) and Scott Sisson (UNSW) covered the ABC basics over the course of 6 presentations and 3 hours.

The second part of the workshop, on Friday, was the more usual collection of research oriented talks. In the morning session, Dennis Prangle spoke about “lazy ABC,” a method of stopping the generation of computationally demanding dataset simulations early, and Chris Drovandi discussed theoretical and practical aspects of Bayesian indirect inference. This was followed by Brenda Nho Vo (QUT) presenting an application of ABC in stochastic cell spreading models, and by Pierre Del Moral (UNSW) who demonstrated many theoretical aspects of ABC in interacting particle systems. After lunch Guilherme Rodrigues (UNSW) proposed using ABC for Gaussian process density estimation (and introduced the infinite-dimensional functional regression adjustment), and Gael Martin (Monash) spoke on the issues involved in applying ABC to state space models. The final talk of the day was given by Matthew Moores who discussed how online ABC dataset generation could be circumvented by pre-computation for particular classes of models.

In all, over 100 people registered for and attended the workshop, making it an outstanding success. Of course, this was helped by the association with the following large conference, and the pricing scheme — completely free! — following the tradition of the previous workshops. Morning and afternoon teas, described as “the best workshop food ever!” by several attendees, was paid for by the workshop sponsors: the Bayesian Section of the Statistical Society of Australia, and the ARC Centre of Excellence in Mathematical and Statistical Frontiers.

Here’s looking forward to the next workshop in the series!

## Statistical modeling and computation [apologies]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on June 11, 2014 by xi'an

In my book review of the recent book by Dirk Kroese and Joshua Chan,  Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references.  [If nothing else, this typo signals it is high time for a change of my prescription glasses.]