In the newspaper I grabbed in the corridor to my plane today (flying to Bristol to attend the SuSTaIn image processing workshop on “High-dimensional Stochastic Simulation and Optimisation in Image Processing” where I was kindly invited and most readily accepted the invitation), I found a two-page entry on estimating the number of homeless deaths using capture-recapture. Besides the sheer concern about the very high mortality rate among homeless persons (expected lifetime, 48 years; around 7000 deaths in France between 2008 and 2010) and the dreadful realisation that there are an increasing number of kids dying in the streets, I was obviously interested in this use of capture-recapture methods as I had briefly interacted with researchers from INED working on estimating the number of (living) homeless persons about 15 years ago. Glancing at the original paper once I had landed, there was alas no methodological innovation in the approach, which was based on the simplest maximum likelihood estimate. I wonder whether or not more advanced models and [Bayesian] methods of inference could [or should] be used on such data. Like introducing covariates in the process. For instance, when conditioning the probability of (cross-)detection on the cause of death.
Archive for image processing
With Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of Statistics & Computing. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).
Across 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform a single SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)
Since my student’s paper on Seaman et al (2012) got promptly rejected by TAS for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.
The dataset used in Bayesian Core for the chapter on image processing is a Landsat picture of Lake of Menteith in Scotland (close to Loch Lomond). (Yes, Lake of Menteith, not Loch Menteith!) Here is the image produced in the book. I just got an email from Matt Moores at QUT that the image is both rotated and flipped:
The image of Lake Mentieth in figure 8.6 of Bayesian Core is upside-down and back-to-front, so to speak. Also, I recently read a paper by Lionel Cucala & J-M Marin that has the same error.
This is due to the difference between matrix indices and image coordinates: matrices in R are indexed by [row,column] but image coordinates are [x,y]. Also, y=1 is the first row of the matrix, but the bottom row of pixels in an image.
Only a one line change to the R code is required to display the image in the correct orientation:image(1:100,1:100,t(as.matrix(lm3)[100:1,]),col=gray(256:1/256),xlab="",ylab="")
As can be checked on Googlemap, the picture is indeed rotated by a -90⁰ angle and the transpose correction does the job!
In order to make advances in the processing of their datasets and experiments, and in the understanding of the fundamental parameters driving the general relativity model, cosmologists are lauching a competition called the great’08 challenge through the Pascal European network. Details about the challenge are available on an arXiv:0802.1214 document, the model being clearly defined from a statistical point of view as a combination of lensing shear (the phenomenon of interest) and of various (=three) convolution noises that make the analysis so challenging, and the date being a collection of images of galaxies. The fundamental problem is to identify a 2d-linear distortion applied to all images within a certain region of the space, up (or down) to a precision of 0.003, the distortion being identified by an isotonic assumption over the un-distrorted images. The solution must be efficient too in that it is to be tested on 27 million galaxies! A standard MCMC mixture analysis on each galaxy is thus unlikely to converge before the challenge is over, next April. I think the challenge is worth considering by statistical teams, even though this represents a considerable involvement over the next six months….