Archive for histogram
And here is the second day of our workshop Advances in Scalable Bayesian Computation gone! This time, it sounded like the “main” theme was about brains… In fact, Simon Barthelmé‘s research originated from neurosciences, while Dawn Woodard dissected a brain (via MRI) during her talk! (Note that the BIRS website currently posts Simon’s video as being Dan Simpson’s talk, the late change in schedule being due to Dan most unfortunately losing his passport during a plane transfer and most unfortunately being prevented from attending…) I found Simon’s talk quite inspiring, with this Tibshirani et al.’s trick of using logistic regression to estimate densities as a classification problem central to the method and suggesting a completely different vista for handling normalising constants… Then Raazesh Sainudiin gave a detailed explanation and validation of his approach to density estimation by multidimensional pavings/histograms, with a tree representation allowing for fast merging of different estimators. Raaz had given a preliminary version of the talk at CREST last Fall, which helped with focussing on the statistical aspects of the method. Chris Strickland then exposed an image analysis of flooded Northern Queensland landscapes, using a spatio-temporal model with changepoints and about 18,000 parameters. still managing to get an efficiency of O(np) thanks to two tricks. Then it was time for the group photograph outside in a balmy -18⁰ and an open research time that was quite profitable.
In the afternoon sessions, Paul Fearnhead presented an auxiliary variable approach to particle Gibbs, which again opened new possibilities for handling state-space models, but also reminding me of Xiao-Li Meng’s reparameterisation devices. And making me wonder (out loud) whether or not the SMC algorithm was that essential in a static setting, since the sequence could be explored in any possible order for a fixed time horizon. Then Emily Fox gave a 2-for-1 talk, mostly focussing on the first talk, where she introduced a new technique for approximating the gradient in Hamiltonian (or Hockey!) Monte Carlo, using second order Langevin. She did not have much time for the second talk, which intersected with the one she gave at BNP’ski in Chamonix, but focussed on a notion of sandwiched slice sampling where the target density only needs bounds that can get improved if needed. A cool trick! And the talks ended with Dawn Woodard‘s analysis of time varying 3-D brain images towards lesion detection, through an efficient estimation of a spatial mixture of normals.
We present a novel method for averaging a sequence of histogram states visited by a Metropolis-Hastings Markov chain whose stationary distribution is the posterior distribution over a dense space of tree-based histograms. The computational efficiency of our posterior mean histogram estimate relies on a statistical data-structure that is sufficient for non-parametric density estimation of massive, multi-dimensional metric data. This data-structure is formalized as statistical regular paving (SRP). A regular paving (RP) is a binary tree obtained by selectively bisecting boxes along their first widest side. SRP augments RP by mutably caching the recursively computable sufficient statistics of the data. The base Markov chain used to propose moves for the Metropolis-Hastings chain is a random walk that data-adaptively prunes and grows the SRP histogram tree. We use a prior distribution based on Catalan numbers and detect convergence heuristically. The L1-consistency of the the initializing strategy over SRP histograms using a data-driven randomized priority queue based on a generalized statistically equivalent blocks principle is proved by bounding the Vapnik-Chervonenkis shatter coefficients of the class of SRP histogram partitions. The performance of our posterior mean SRP histogram is empirically assessed for large sample sizes simulated from several multivariate distributions that belong to the space of SRP histograms.
The paper actually appeared in the special issue of TOMACS Arnaud Doucet and I edited last year. It is coauthored by Dominic Lee, Jennifer Harlow and Gloria Teng. Unfortunately, Raazesh could not connect to our video-projector. Or fortunately as he gave a blackboard talk that turned to be fairly intuitive and interactive.
During a short if profitable visit to Dublin for a SFI meeting on Tuesday/Friday, I had the opportunity to visit the National Gallery of Ireland in my sole hour of free time (as my classy hotel was very close). The building itself is quite nice, being well-inserted between brick houses from the outside, while providing impressive height, space, and light from the inside.
The masterpiece gallery is quite small (unless I missed a floor!), if filled with masterpieces like a painting by Caillebotte I did not know.
The modern art gallery was taken by a temporary (and poorly exposed) exhibit that includes live happenings (five persons wearing monkish outfits standing around a mommy floating in mid-air), tags (!), and two interesting pieces: one was made of several tables filed with piles of books glued together and sculpted, giving an output that looked like 2-D histograms, and reminding me of the fear histograms discussed on Statisfaction by Julyan a few days ago. (Note the Mathematica book in the last picture!) While I love books very much, I am also quite interested in sculptures involving books, like the one I saw a few years ago where the artist had grown different cereals on opened books: although it may sound like an easy trick (food for thought and all that), the result was amazing and impressive!
The second piece was a beautiful board illuminated by diodes which felts very warm and comforting, maybe in reminiscence of the maternal womb, of candles, or of myriads of galaxies, but very powerful in any case. (I usually dislike constructs involving light, like the neon sculptures of the 80’s, so I started with an a priori against it.) I could have stayed there for hours…
Following my earlier posts on the revision of Lack of confidence, here is an interesting outcome from the derivation of the exact marginal likelihood in the Laplace case. Computing the posterior probability of a normal model versus a Laplace model in the normal (gold) and the Laplace (chocolate) settings leads to the above histogram(s), which show(s) that the Bayesian solution is discriminating (in a frequentist sense), even for 21 observations. If instead I use R density() over the posterior probabilities, I get this weird and unmotivated flat density in the Laplace case. It looked as if the (frequentist) density of the posterior probability under the alternative was uniform, although there is no reason for this phenomenon!
Another meaningless graph found in the November issue of La Recherche: a histogram of the predictions of the World population by 2005 attached to a brief discussion of the challenges of providing food for this population. No mention is made of the source(s) for this absurd agglomerate of predictions, (could I add mine as well?!) while the discussion picks the median prediction for its reference number: as if Science was run by majority rule… As an unflattering coincidence (for La Recherche!), the other French monthly popular science magazine Pour la Science has simultaneously published a rather well-argumented special issue on randomness (by Jaroslaw Strzalko, Juliusz Grabski and Tomasz Kapitaniak who are Polish physicists), refering to one recent paper by Persi Diaconis on the randomness of coin tosses. Being associated with Scientific American certainly helps in producing quality papers! (There is also a paper by Ivar Ekeland in the same issue, as well as the paper by Andrew Gelman already signaled.)