## independent component analysis and p-values

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 8, 2014 by xi'an

Last morning at the neuroscience workshop Jean-François Cardoso presented independent component analysis though a highly pedagogical and enjoyable tutorial that stressed the geometric meaning of the approach, summarised by the notion that the (ICA) decomposition

$X=AS$

of the data X seeks both independence between the columns of S and non-Gaussianity. That is, getting as away from Gaussianity as possible. The geometric bits came from looking at the Kullback-Leibler decomposition of the log likelihood

$-\mathbb{E}[\log L(\theta|X)] = KL(P,Q_\theta) + \mathfrak{E}(P)$

where the expectation is computed under the true distribution P of the data X. And Qθ is the hypothesised distribution. A fine property of this decomposition is a statistical version of Pythagoreas’ theorem, namely that when the family of Qθ‘s is an exponential family, the Kullback-Leibler distance decomposes into

$KL(P,Q_\theta) = KL(P,Q_{\theta^0}) + KL(Q_{\theta^0},Q_\theta)$

where θ⁰ is the expected maximum likelihood estimator of θ. (We also noticed this possibility of a decomposition in our Kullback-projection variable-selection paper with Jérôme Dupuis.) The talk by Aapo Hyvärinen this morning was related to Jean-François’ in that it used ICA all the way to a three-level representation if oriented towards natural vision modelling in connection with his book and the paper on unormalised models recently discussed on the ‘Og.

On the afternoon, Eric-Jan Wagenmaker [who persistently and rationally fight the (ab)use of p-values and who frequently figures on Andrew’s blog] gave a warning tutorial talk about the dangers of trusting p-values and going fishing for significance in existing studies, much in the spirit of Andrew’s blog (except for the defence of Bayes factors). Arguing in favour of preregistration. The talk was full of illustrations from psychology. And included the line that ESP testing is the jester of academia, meaning that testing for whatever form of ESP should be encouraged as a way to check testing procedures. If a procedure finds a significant departure from the null in this setting, there is something wrong with it! I was then reminded that Eric-Jan was one of the authors having analysed Bem’s controversial (!) paper on the “anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms”… (And of the shocking talk by Jessica Utts on the same topic I attended in Australia two years ago.)

## ABC in Sydney [guest post #2]

Posted in pictures, Statistics, University life with tags , , , on July 24, 2014 by xi'an

[Here is a second guest post on the ABC in Sydney workshop, written by Chris Drovandi]

First up Dennis Prangle presented his recent work on “Lazy ABC”, which can speed up ABC by potentially abandoning model simulations early that do not look promising. Dennis introduces a continuation probability to ensure that the target distribution of the approach is still the ABC target of interest. In effect, the ABC likelihood is estimated to be 0 if early stopping is performed otherwise the usual ABC likelihood is inflated by dividing by the continuation probability, ensuring an unbiased estimator of the ABC likelihood. The drawback is that the ESS (Dennis uses importance sampling) of the lazy approach will likely be less than usual ABC for a fixed number of simulations; but this should be offset by the reduction in time required to perform said simulations. Dennis also presented some theoretical work for optimally tuning the method, which I need more time to digest.
This was followed by my talk on Bayesian indirect inference methods that use a parametric auxiliary model (a slightly older version here). This paper has just been accepted by Statistical Science.
Morning tea was followed by my PhD student, Brenda Vo, who presented an interesting application of ABC to cell spreading experiments. Here an estimate of the diameter of the cell population was used as a summary statistic. It was noted after Brenda’s talk that this application might be a good candidate for Dennis’ Lazy ABC idea. This talk was followed by a much more theoretical presentation by Pierre del Moral on how particle filter methodologies can be adapted to the ABC setting and also a general framework for particle methods.
Following lunch, Guilherme Rodrigues presented a hierarchical Gaussian Process model for kernel density estimation in the presence of different subgroups. Unfortunately my (lack of) knowledge on non-parametric methods prevents me from making any further comment except that the model looked very interesting and ABC seemed a good candidate for calibrating the model. I look forward to the paper appearing on-line.
The next presentation was by Gael Martin who spoke about her research on using ABC for estimation of complex state space models. This was probably my favourite talk of the day, and not only because it is very close to my research interests. Here the score of the Euler discretised approximation of the generative model was used as summary statistics for ABC. From what I could gather, it was demonstrated that the ABC posterior based on the score or the MLE of the auxiliary model were the same in the limit as ε 0 (unless I have mis-interpreted). This is a very useful result in itself; using the score to avoid an optimisation required for the MLE can save a lot of computation. The improved approximations of the proposed approach compared with the results that use the likelihood of the Euler discretisation were quite promising. I am certainly looking forward to this paper coming out.
Matt Moores drew the short straw and had the final presentation on the Friday afternoon. Matt spoke about this paper (an older version is available here), of which I am now a co-author. Matt’s idea is that doing some pre-simulations across the prior space and determining a mapping between the parameter of interest and the mean and variance of the summary statistic can significantly speed up ABC for the Potts model, and potentially other ABC applications. The results of the pre-computation step are used in the main ABC algorithm, which no longer requires simulation of pseudo-data but rather a summary statistic can be simulated from the fitted auxiliary model in the pre-processing step. Whilst this approach does introduce a couple more layers of approximation, the gain in computation time was up to two orders of magnitude. The talks by Matt, Gael and myself gave a real indirect inference flavour to this year’s ABC in…

## ABC in Sydney [guest post]

Posted in pictures, Statistics, University life with tags , , , on July 18, 2014 by xi'an

[Scott Sisson sent me this summary of the ABC in Sydney meeting that took place two weeks ago.]

Following on from ABC in Paris (2009), ABC in London (2011) and ABC in Rome (2013), the fourth instalment of the international workshops in Approximate Bayesian Computation (ABC) was held at UNSW in Sydney on 3rd-4th July 2014. The first antipodean workshop was held as a satellite to the huge (>550 registrations) IMS-ASC-2014 International Conference, also held in Sydney the following week.

ABC in Sydney was created in two parts. The first, on the Thursday, was held as an “introduction to ABC” for people who were interested to find out more about the subject, but who had not particularly been exposed to the area before. Rather than have a single brave individual give the introductory course over several hours, the expository presentation was “crowdsourced” from several experienced researchers in the field, with each being given 30 minutes to present on a particular aspect of ABC. In this way, Matthew Moores (QUT), Dennis Prangle (Reading), Chris Drovandi (QUT), Zach Aandahl (UNSW) and Scott Sisson (UNSW) covered the ABC basics over the course of 6 presentations and 3 hours.

The second part of the workshop, on Friday, was the more usual collection of research oriented talks. In the morning session, Dennis Prangle spoke about “lazy ABC,” a method of stopping the generation of computationally demanding dataset simulations early, and Chris Drovandi discussed theoretical and practical aspects of Bayesian indirect inference. This was followed by Brenda Nho Vo (QUT) presenting an application of ABC in stochastic cell spreading models, and by Pierre Del Moral (UNSW) who demonstrated many theoretical aspects of ABC in interacting particle systems. After lunch Guilherme Rodrigues (UNSW) proposed using ABC for Gaussian process density estimation (and introduced the infinite-dimensional functional regression adjustment), and Gael Martin (Monash) spoke on the issues involved in applying ABC to state space models. The final talk of the day was given by Matthew Moores who discussed how online ABC dataset generation could be circumvented by pre-computation for particular classes of models.

In all, over 100 people registered for and attended the workshop, making it an outstanding success. Of course, this was helped by the association with the following large conference, and the pricing scheme — completely free! — following the tradition of the previous workshops. Morning and afternoon teas, described as “the best workshop food ever!” by several attendees, was paid for by the workshop sponsors: the Bayesian Section of the Statistical Society of Australia, and the ARC Centre of Excellence in Mathematical and Statistical Frontiers.

Here’s looking forward to the next workshop in the series!

## Statistical modeling and computation [apologies]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on June 11, 2014 by xi'an

In my book review of the recent book by Dirk Kroese and Joshua Chan,  Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references.  [If nothing else, this typo signals it is high time for a change of my prescription glasses.]

## Flaggermusmannen [book review]

Posted in Books, Kids, Travel with tags , , , , , , on May 25, 2014 by xi'an

“Cold, concise statistics. Keyword number one is statistical significance. In other words, we are looking for a system that cannot be explained by statistical chance (…) this group constitutes less than five percent of the female population. Yet I was left with seven murders and over forty rapes.” J. Nesbø

Another first novel! The Bat (Flaggermusmannen) by  Jo Nesbø has been sitting in my bedside book pile for quite a while, until I decided to read it a few days ago. It is the first appearance of Inspector Harry Hole in a published book and was written in 1997, although translated into English much much later. (The book was nominated as Best Norwegian Crime Novel of the Year and as Best Nordic Crime Novel of the Year.)

“Life consists of a series of quite improbable chance occurrences (…) What bothers me is that I’ve got that lottery number too many times in a row.” J. Nesbø

I read the (later) novel The Redeemer a few years ago, taking place mostly in Norway and kept a globally positive impression about the book, even though the plot was a bit stretched… The Bat has somewhat the same defects as The Ice Princess in that it sounds too much like an exercise in thriller writing, albeit in a much less clumsy style! The central character of Harry Hole is well-done, in an engaging-despite-his-shortcomings style and the way he gets along with most of the people he meets is rather realistic. However, the setting of the first novel in Australia (rather than Norway) is sort of a failure in that the country and Sydney are more caricatures than realistic in any degree…. For instance, every aboriginal Harry meets must resort to traditional tales involving emus and lizards and other local animals. One such tale would be ok but so many are just a bore. The title itself is connected to yet another aboriginal myth. And to the murders occurring way too often in the novel. Similarly, every foreign backpacker met in the pages of The Bat is either dumb or on her way to become a waitress to recover from a failed love affair. And a major character is a transvestite playing in a theatre, maybe because Nesbø has watched Priscilla Queen of the Desert a few years earlier… And the Australian police officers sound both very heavy in colloquialism and quite light in detective skills. Lacking an obvious connection to a series of young women murders throughout Australia. The second part of the novel gets too artificial to remain gripping and I completed the book with a feeling of chore accomplished…, not of surprise or shock at the resolution of the murders. I thus concur with many other readers of the book that it is certainly far from being the best in the series!

## ABC in Sydney, July 3-4, 2014!!!

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on February 12, 2014 by xi'an

After ABC in Paris in 2009, ABC in London in 2011, and ABC in Roma last year, things are accelerating since there will be—as I just learned—  an ABC in Sydney next July (not June as I originally typed, thanks Robin!). The workshop on the current developments of ABC methodology thus leaves Europe to go down-under and to take advantage of the IMS Meeting in Sydney on July 7-10, 2014. Hopefully, “ABC in…” will continue its tour of European capitals in 2015! To keep up with an unbroken sequence of free workshops, Scott Sisson has managed to find support so that attendance is free of charge (free as in “no registration fee at all”!) but you do need to register as space is limited. While I would love to visit UNSW and Sydney once again and attend the workshop, I will not, getting ready for Cancún and our ABC short course there.

## Statistical modeling and computation [book review]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , on January 22, 2014 by xi'an

Dirk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled Statistical Modeling and Computation, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, Statistical Modeling and Computation is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our Bayesian Essentials with R, meaning that I am not the most appropriate or least unbiased reviewer for this book. Continue reading