Archive for galaxies

big data, big models, it is a big deal! [posters & talks]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on September 3, 2014 by xi'an

bdbmbdGreat poster session yesterday night and at lunch today. Saw an ABC poster (by Dennis Prangle, following our random forest paper) and several MCMC posters (by Marco Banterle, who actually won one of the speed-meeting mini-project awards!, Michael Betancourt, Anne-Marie Lyne, Murray Pollock), and then a rather different poster on Mondrian forests, that generalise random forests to sequential data (by Balaji Lakshminarayanan).  The talks all had interesting aspects or glimpses about big data and some of the unnecessary hype about it (them?!), along with exposing the nefarious views of Amazon to become the Earth only seller!, but I particularly enjoyed the astronomy afternoon and even more particularly Steve Roberts sweep through astronomy machine-learning. Steve characterised variational Bayes as picking your choice of sufficient statistics, which made me wonder why there were no stronger connections between variational Bayes and ABC. He also quoted the book The Fourth Paradigm: Data-Intensive Scientific Discovery by Tony Hey as putting forward interesting notions. (A book review for the next vacations?!) And also mentioned zooniverse, a citizens science website I was not aware of. With a Bayesian analysis of the learning curve of those annotating citizens (in the case of supernovae classification). Big deal, indeed!!!

The great’08 Pascal challenge

Posted in Statistics with tags , , , , , , , on October 8, 2008 by xi'an

In order to make advances in the processing of their datasets and experiments, and in the understanding of the fundamental parameters driving the general relativity model, cosmologists are lauching a competition called the great’08 challenge through the Pascal European network. Details about the challenge are available on an arXiv:0802.1214 document, the model being clearly defined from a statistical point of view as a combination of lensing shear (the phenomenon of interest) and of various (=three) convolution noises that make the analysis so challenging, and the date being a collection of images of galaxies. The fundamental problem is to identify a 2d-linear distortion applied to all images within a certain region of the space, up (or down) to a precision of 0.003, the distortion being identified by an isotonic assumption over the un-distrorted images. The solution must be efficient too in that it is to be tested on 27 million galaxies! A standard MCMC mixture analysis on each galaxy is thus unlikely to converge before the challenge is over, next April. I think the challenge is worth considering by statistical teams, even though this represents a considerable involvement over the next six months….