Just another perfect day in Providence! After a brisk run in the eearly morning which took me through Brown campus, I attended the lecture by Sean Meyn on feedback particle filters. As it was mostly on diffusions with control terms, just too far from my field, I missed most of the points. (My fault, not Sean’s!) Then Ramon von Handel gave a talk about the curse(s) of dimensionality in particle filters, much closer to my interests, with a good summary of why (optimal) filters were not suffering from a curse in n, the horizon size, but in d, the dimension of the space, followed by an argument that some degree of correlation decay could overcome this dimensional curse as well. After the lunch break (where I thought further about the likelihood principle!), Dana Randall gave a technical talk on mixing properties of the hardcore model on Z² and bounding the cutoff parameter, which is when I appreciated the ability to follow talks from the ICERM lounge, watching slides and video of the talk taking place on the other side of the wall! At last, and in a programming contrapoint from slowly mixing to fastest mixing, Jim Fill presented his recent work on ordering Markov chains and finding fastest-mixing chains, which of course reminded me of Peskun ordering although there may be little connection in the end. The poster session in the evening had sufficiently few posters to make the discussion with each author enjoyable and relevant.A consistent feature of the meeting thus, allowing for quality interacting time between participants. I am now looking forward the final day with a most intriguing title by my friend Eric Moulines on TBA…
Archive for November, 2012
As mentioned in my review of Paradoxes in Scientific Inference I was a bit confused by this presentation of the likelihood principle and this led me to ponder for a week or so whether or not there was an issue with Birnbaum’s proof (or, much more likely, with my vision of it!). After reading again Birnbaum’s proof, while sitting down in a quiet room at ICERM for a little while, I do not see any reason to doubt it. (Keep reading at your own risk!)
My confusion was caused by mixing sufficiency in the sense of Birnbaum’s mixed experiment with sufficiency in the sense of our ABC model choice PNAS paper, namely that sufficient statistics are not always sufficient to select the right model. The sufficient statistics in the proof reduces the (2,x2) observation from Model 2 to (1,x1) from Model 1 when there is an observation x1 that produces a likelihood proportional to the likelihood for x2 and the statistic is indeed sufficient: the distribution of (2,x2) given (1,x1) does not depend on the parameter θ. Of course, the statistic is not sufficient (most of the time) for deciding between Model 1 and Model 2, but this model choice issue is foreign to Birnbaum’s construction.
As I mentioned yesterday, and earlier, I was rather excited by the visit of the ICERM building. As it happens, the centre is located at the upper floor of a (rather bland!) 11 floor building sitting between Main St. and the river. It is quite impressive indeed, with a feeling of space due to the high ceilings and the glass walls all around the conference room, plus pockets of quietness with blackboards at the rescue. The whiteboard that makes the wall between the conference room and the lobby is also appreciable for discussion as it is huge (the whole wall is the whiteboard!) and made of a glassy material that makes writing on it a true pleasure (the next step would be to have a recording device embedded in it!). When I gave my talk and attended the other three talks of the day, I kind of regretted that the dual projector system would not allow for a lag of sorts in the presentation. Even though the pace of the other talks was quite reasonable (mine was a bit hurried I am afraid!), writing down a few notes was enough for me to miss some point from the previous slide. With huge walls, it should be easy to project at least the previous slide at the same time and maybe even all of the previous slide (maybe, maybe not, as it would get quickly confusing…)
Paul Dupuis’ talk covered new material (at least for me) on importance sampling for diffusions and the exploration of equilibriums, and it was thus quite enjoyable, even when fighting one of my dozing attacks. Gareth Roberts’ talk provided a very broad picture of the different optimal scalings (à la 0.234!) for MCMC algorithms (while I have attended several lectures by Gareth on this theme, there is always something new and interesting coming out of them!). Krzysztof Latuszynski’s talk on irreducible diffusions and the construction of importance sampling solutions replacing the (unavailable) exact sampling of Beskos et al. (2006) led to some discussion on the handling of negative weights. This is a question that has always intrigued me: if unbiasedness or exact simulation or something else induce negative weights in a sample, how can we process those weights when resampling? The conclusion of the discussion was that truncating the weights to zero seemed like the best solution, at least when resampling since the weights can be used as such in averages, but I wonder if there is a more elaborate scheme involving mixtures or whatnot!
Another read today and not from JRSS B for once, namely, Efron‘s (an)other look at the Jackknife, i.e. the 1979 bootstrap classic published in the Annals of Statistics. My Master students in the Reading Classics Seminar course thus listened today to Marco Brandi’s presentation, whose (Beamer) slides are here:
In my opinion this was an easier paper to discuss, more because of its visible impact than because of the paper itself, where the comparison with the jackknife procedure does not sound so relevant nowadays. again mostly algorithmic and requiring some background on how it impacted the field. Even though Marco also went through Don Rubin’s Bayesian bootstrap and Michael Jordan bag of little bootstraps, he struggled to get away from the technicality towards the intuition and the relevance of the method. The Bayesian bootstrap extension was quite interesting in that we discussed a lot the connections with Dirichlet priors and the lack of parameters that sounded quite antagonistic with the Bayesian principles. However, at the end of the day, I feel that this foundational paper was not explored in proportion to its depth and that it would be worth another visit.
I have just arrived in Providence, RI, for the ICERM workshop on Performance Analysis of Monte Carlo Methods. While the plane trip was uneventful and even relaxing, as I could work on the revision to our ABCel (soon to be BCel!) paper, the bus trip from Boston to Providence, while smooth, quiet, wirelessed, and on-time, was a wee too much as it was already late for my standards… Anyway, I am giving one of the talks tomorrow, with a pot-pourri on ABC and empirical likelihood as in Ames and Chicago last month. The format of the workshop sounds very nice, with only four talks a day, which should leave a lot of space for interactions between participants (if I do not crash from my early early rise…) And, as mentioned earlier, I am looking forward visiting the futuristic building.
Richard Everitt tweetted yesterday about a recent publication in JCGS by Rajib Paul, Steve MacEachern and Mark Berliner on convergence assessment via stratification. (The paper is free-access.) Since this is another clear interest of mine’s, I had a look at the paper in the train to Besançon. (And wrote this post as a result.)
The idea therein is to compare the common empirical average with a weighted average relying on a partition of the parameter space: restricted means are computed for each element of the partition and then weighted by the probability of the element. Of course, those probabilities are generally unknown and need to be estimated simultaneously. If applied as is, this idea reproduces the original empirical average! So the authors use instead batches of simulations and corresponding estimates, weighted by the overall estimates of the probabilities, in which case the estimator differs from the original one. The convergence assessment is then to check both estimates are comparable. Using for instance Galin Jone’s batch method since they have the same limiting variance. (I thought we mentioned this damning feature in Monte Carlo Statistical Methods, but cannot find a trace of it except in my lecture slides…)
The difference between both estimates is the addition of weights p_in/q_ijn, made of the ratio of the estimates of the probability of the ith element of the partition. This addition thus introduces an extra element of randomness in the estimate and this is the crux of the convergence assessment. I was slightly worried though by the fact that the weight is in essence an harmonic mean, i.e. 1/q_ijn/Σ q_imn… Could it be that this estimate has no finite variance for a finite sample size? (The proofs in the paper all consider the asymptotic variance using the delta method.) However, having the weights adding up to K alleviates my concerns. Of course, as with other convergence assessments, the method is not fool-proof in that tiny, isolated, and unsuspected spikes not (yet) visited by the Markov chain cannot be detected via this comparison of averages.
Both Pour la Science and La Recherche, two French science magazines, had an entry this month on the abc conjecture! However, ABC being a common accronym, it is alas unrelated with my research theme. The abc conjecture is a number theory conjecture that states that if a and b are integers with no common factor and a small number of prime dividers, this does not hold for c=a+b. This is the abc triplet. (More precisely, the conjecture states that the quality of the triplet abc:
is larger than 1+ε for a finite number of triplets abc.) A proof of the conjecture by Shinichi Mochizuki was recently proposed, hence the excitment in the community. In La Recherche, I read that this conjecture is associated with an interesting computing challenge, namely to find the exhaustive collection of triplets with a quality more than a given bound 1+ε.