Archive for the Running Category
Over the past weekend, several men were killed in Spain. By running bulls. Most in the streets of Pamplona. And one in a bullring. The argument behind those events is to invoke “tradition“. Which means it happened for several centuries. Starting (?) with the Minoan bull leapers. But that does not make an argument when live are at stake and animals turned into killing machines (and often injured in the process). A lot of other absurd and cruel traditions have rightly disappeared over the years… The European Union does not yet prohibit this animal mistreatment although it prohibits EU funds to be used as support, but it should as it is high time bull running and bull fighting are prohibited everywhere in Spain (as bull fighting currently is in both Catalunya and the Canary Islands). [I realise this is not the number one problem in the World and that many more people died the same weekend of deaths that could have been avoided, it is just that it would be so simple to eliminate this one (dis)reason.]
The past week I spent in Warwick ended up with a workshop on retrospective Monte Carlo, which covered exact sampling, debiasing, Bernoulli factory problems and multi-level Monte Carlo, a definitely exciting package! (Not to mention opportunities to go climbing with some participants.) In particular, several talks focussed on the debiasing technique of Rhee and Glynn (2012) [inspired from von Neumann and Ulam, and already discussed in several posts here]. Including results in functional spaces, as demonstrated by a multifaceted talk by Sergios Agapiou who merged debiasing, deburning, and perfect sampling.
From a general perspective on unbiasing, while there exist sufficient conditions to ensure finite variance and aim at an optimal version, I feel a broader perspective should be adopted towards comparing those estimators with biased versions that take less time to compute. In a diffusion context, Chang-han Rhee presented a detailed argument as to why his debiasing solution achieves a O(√n) convergence rate in opposition the regular discretised diffusion, but multi-level Monte Carlo also achieves this convergence speed. We had a nice discussion about this point at the break, with my slow understanding that continuous time processes had much much stronger reasons for sticking to unbiasedness. At the poster session, I had the nice surprise of reading a poster on the penalty method I discussed the same morning! Used for subsampling when scaling MCMC.
On the second day, Gareth Roberts talked about the Zig-Zag algorithm (which reminded me of the cigarette paper brand). This method has connections with slice sampling but it is a continuous time method which, in dimension one, means running a constant velocity particle that starts at a uniform value between 0 and the maximum density value and proceeds horizontally until it hits the boundary, at which time it moves to another uniform. Roughly. More specifically, this approach uses piecewise deterministic Markov processes, with a radically new approach to simulating complex targets based on continuous time simulation. With computing times that [counter-intuitively] do not increase with the sample size.
Mark Huber gave another exciting talk around the Bernoulli factory problem, connecting with perfect simulation and demonstrating this is not solely a formal Monte Carlo problem! Some earlier posts here have discussed papers on that problem, but I was unaware of the results bounding [from below] the expected number of steps to simulate B(f(p)) from a (p,1-p) coin. If not of the open questions surrounding B(2p). The talk was also great in that it centred on recursion and included a fundamental theorem of perfect sampling! Not that surprising given Mark’s recent book on the topic, but exhilarating nonetheless!!!
The final talk of the second day was given by Peter Glynn, with connections with Chang-han Rhee’s talk the previous day, but with a different twist. In particular, Peter showed out to achieve perfect or exact estimation rather than perfect or exact simulation by a fabulous trick: perfect sampling is better understood through the construction of random functions φ¹, φ², … such that X²=φ¹(X¹), X³=φ²(X²), … Hence,
which helps in constructing coupling strategies. However, since the φ’s are usually iid, the above is generally distributed like
which seems pretty similar but offers a much better concentration as t grows. Cutting the function composition is then feasible towards producing unbiased estimators and more efficient. (I realise this is not a particularly clear explanation of the idea, detailed in an arXival I somewhat missed. When seen this way, Y would seem much more expensive to compute [than X].)
This Monday, I made a most pleasant trip to the Observatoire de Paris, which campus is located in Meudon and no longer in Paris. (There also is an Observatoire de Paris campus in downtown Paris, created in 1667, where no observation can take place.) Most pleasant for many reasons. First, I was to meet with Frédéric Arenou and two visiting astrostatisticians from Kolkata, India, whom I met in Bangalore two years ago. Working on a neat if no simple issue of inverted mean estimation. Second, because the place is beautiful, with great views of Paris (since the Observatoire is on a ridge), and with a classical-looking building actually made of recycled castle parts after the Franco-Prussian war of 1870, and because Frédéric gave us a grand tour of place. And third, because I went there by bike through the Forêt de Meudon which I did not suspect was that close to home and which I crossed on downhill muddy trails that made me feel far away from Paris! And giving me the opportunity to test the mettle of a new mountain bike elsewhere than again Parisian SUVs. (This was the first day of a relatively intense biking week, which really helped with the half-marathon training: San Francisco ½ is in less than a month!!! And I am in wave 2!)
This series of posts is most probably getting by now an imposition on the ‘Og readership, which either attended ISBA 2016 and does (do?) not need my impressions or did not attend and hence does (do?) not need vague impressions about talks they (it?) did not see, but indulge me in reminiscing about this last ISBA meeting (or more reasonably ignore this post altogether). Now that I am back home (with most of my Sard wine bottles intact!, and a good array of Sard cheeses).
This meeting seems to be the largest ISBA meeting ever, with hundreds of young statisticians taking part in it (despite my early misgivings about the deterrent represented by the overall cost of attending the meeting. I presume holding the meeting in Europe made it easier and cheaper for most Europeans to attend (and hopefully the same will happen in Edinburgh in 2018!), as was the (somewhat unsuspected) wide availability of rental alternatives in the close vicinity of the conference resort. I also presume the same travel opportunities would not have been true in Banff, although local costs would have been lower. It was fantastic to see so many new researchers interested in Bayesian statistics and to meet some of them. And to have more sessions run by the j-Bayes section of ISBA (although I found it counterproductive that such sessions do not focus on a thematically coherent theme). As a result, the meeting was more intense than ever and I found it truly exhausting, despite skipping most poster sessions. Maybe also because I did not skip a single session thanks to the availability of an interesting theme for each block in the schedule. (And because I attended more [great] Sard dinners than I originally intended.) Having five sessions in parallel indeed means there is a fabulous offer of themes for every taste. It also means there are inevitably conflicts when picking one’s session.
Back to poster sessions, I feel I missed an essential part of the meeting, which made ISBA meetings so unique, but it also seems to me the organisation of those sessions should be reconsidered against the rise in attendance. (And my growing inability to stay up late!) One solution suggested by my recent AISTATS experience is to select posters towards lowering the number of posters in the four poster sessions. The success rate for the Cadiz meeting was 35%.) The obvious downsizes are the selection process (but this was done quite efficiently for AISTATS) and the potential reduction in the number of participants. A medium ground could see a smaller fraction of posters to be selected by this process (and published one way or another as in machine-learning conferences) and presented during the evening poster sessions, with other posters being given during the coffee breaks [which certainly does not help in reducing the intensity of the schedule]. Another and altogether solution is to extend the parallelism of oral sessions to poster sessions, by regrouping them into five or six themes or keywords chosen by the presenters and having those presented in different rooms to split the attendance down to human level and tolerable decibels. Nothing preventing participants to visit several rooms in a given evening. Or to keep posters for several nights in a row if the number of rooms allows.
It may also be that this edition of ISBA 2016 sees the end of the resort-style meeting in the spirit of the early Valencia meetings. Edinburgh 2018 will certainly be an open-space conference in that meals and lodgings will be “on” the participants who may choose where and how much. I have heard many times the argument that conferences held in single hotels or resorts facilitated the contacts between young and senior researchers, but I fear this is not sustainable against the growth of the audience. Holding the meeting in a reasonably close and compact location, as a University building, should allow for a sufficient degree of interaction, as was the case at ISBA 2016. (Kerrie Mengersen also suggested that a few restaurants nearby could be designated as “favourites” for participants to interact at dinner time.) Another suggestion to reinforce networking and interacting would be to hold more satellite workshops before the main conference. It seems there could be a young Bayesian workshop in England the prior week as well as a summer short course on simulation methods.
Organising meetings is getting increasingly complex and provides few rewards at the academic level, so I am grateful to the organisers of ISBA 2016 to have agreed to carry the burden this year. And to the scientific committee for setting the quality bar that high. (A special thought too for my friend Walter Racugno who had the ultimate bad luck of having an accident the very week of the meeting he had contributed to organise!)
[Even though I predict this is my last post on ISBA 2016 I would be delighted to have guest posts on others’ impressions on the meeting. Feel free to send me entries!]
On Thursday, I started the day by a rather masochist run to the nearby hills, not only because of the very hour but also because, by following rabbit trails that were not intended for my size, I ended up being scratched by thorns and bramble all over!, but also with neat views of the coast around Pula. From there, it was all downhill [joke]. The first morning talk I attended was by Paul Fearnhead and about efficient change point estimation (which is an NP hard problem or close to). The method relies on dynamic programming [which reminded me of one of my earliest Pascal codes about optimising a dam debit]. From my spectator’s perspective, I wonder[ed] at easier models, from Lasso optimisation to spline modelling followed by testing equality between bits. Later that morning, James Scott delivered the first Bayarri Lecture, created in honour of our friend Susie who passed away between the previous ISBA meeting and this one. James gave an impressive coverage of regularisation through three complex models, with the [hopefully not degraded by my translation] message that we should [as Bayesians] focus on important parts of those models and use non-Bayesian tools like regularisation. I can understand the practical constraints for doing so, but optimisation leads us away from a Bayesian handling of inference problems, by removing the ascertainment of uncertainty…
Later in the afternoon, I took part in the Bayesian foundations session, discussing the shortcomings of the Bayes factor and suggesting the use of mixtures instead. With rebuttals from [friends in] the audience!
This session also included a talk by Victor Peña and Jim Berger analysing and answering the recent criticisms of the Likelihood principle. I am not sure this answer will convince the critics, but I won’t comment further as I now see the debate as resulting from a vague notion of inference in Birnbaum‘s expression of the principle. Jan Hannig gave another foundation talk introducing fiducial distributions (a.k.a., Fisher’s Bayesian mimicry) but failing to provide a foundational argument for replacing Bayesian modelling. (Obviously, I am definitely prejudiced in this regard.)
The last session of the day was sponsored by BayesComp and saw talks by Natesh Pillai, Pierre Jacob, and Eric Xing. Natesh talked about his paper on accelerated MCMC recently published in JASA. Which surprisingly did not get discussed here, but would definitely deserve to be! As hopefully corrected within a few days, when I recoved from conference burnout!!! Pierre Jacob presented a work we are currently completing with Chris Holmes and Lawrence Murray on modularisation, inspired from the cut problem (as exposed by Plummer at MCMski IV in Chamonix). And Eric Xing spoke about embarrassingly parallel solutions, discussed a while ago here.
As an organiser of the ABC session (along with Paul Fearnhead), I was already aware of most results behind the talks, but nonetheless got some new perspectives from the presentations. Ewan Cameron discussed a two-stage ABC where the first step is actually an indirect inference inference, which leads to a more efficient ABC step. With applications to epidemiology. Lu presented extensions of his work with Paul Fearnhead, incorporating regression correction à la Beaumont to demonstrate consistency and using defensive sampling to control importance sampling variance. (While we are working on a similar approach, I do not want to comment on the consistency part, but I missed how defensive sampling can operate in complex ABC settings, as it requires advanced knowledge on the target to be effective.) And Ted Meeds spoke about two directions for automatising ABC (as in the ABcruise), from incorporating the pseudo-random generator into the representation of the ABC target, to calling for deep learning advances. The inclusion of random generators in the transform is great, provided they can remain black boxes as otherwise they require recoding. (This differs from quasi-Monte Carlo ABC, which aims at reducing the variability due to sheer noise.) It took me a little while, but I eventually understood why Jan Haning saw this inclusion as a return to fiducial inference!
Merlise Clyde gave a wide-ranging plenary talk on (linear) model selection that looked at a large range of priors under the hat of generalised confluent hypergeometric priors over the mixing scale in Zellner’s g-prior. Some were consistent under one or both models, maybe even for misspecified models. Some parts paralleled my own talk on the foundations of Bayesian tests, no wonder since I mostly give a review before launching into a criticism of the Bayes factor. Since I think this may be a more productive perspective than trying to over-come the shortcomings of Bayes factors in weakly informative settings. Some comments at the end of Merlise’s talk were loosely connected to this view in that they called for an unitarian perspective [rather than adapting a prior to a specific inference problem] with decision-theoretic backup. Conveniently the next session was about priors and testing, obviously connected!, with Leo Knorr-Held considering g-priors for the Cox model, Kerrie Mengersen discussing priors for over-fitted mixtures and HMMs, and Dan Simpson entertaining us with his quest of a prior for a point process, eventually reaching PC priors.