With my daughter Rachel and her friend Clément, we tried last week to bag a few summits in the Monte Rosa massif, which stands between Italy (Aosta) and Switzerland (Zermatt). I wanted to take advantage of the Bastille Day break and we drove from Paris to Aosta in the very early morning, stopping in Chamonix to rent shoes and crampons, and meeting with our guide Abele Blanc at noon, before going together to the hut Rifugio Città di Mantova. At 3500m. Our goal was to spent the night there and climb to Punta Gnifetti (Rifugio Margherita) and Zumstein the next morning. Before heading back to Paris in the evening. However, it did not work out that way as I got a slight bout of mountain sickness that left me migrainous, nauseous, and having a pretty bad night, despite great conditions at the hut… So (despite my intense training of the previous weeks!) I did not feel that great when we left the hut at 5am. The weather was fine if cold and windy, but after two hours of moderate climbing in a fairly pleasant crispy snow of a glacier, Rachel was too out of breath to continue and Abele realised my nose had [truly] frozen (I could not feel anything!) and took us down before continuing with Clément to both peaks. This was quite a disappointment as we had planned this trip over several months, but it was clearly for the best as my fingers were definitely close to frozen (with my worst case ever of screamin’ barfies on the way down!). And we thus spent the rest of the morning waiting for our friends, warming up with tea in the sunshine. Upon reflection, planning one extra day of acclimatisation to altitude and cold would have been more reasonable and keeping handwarmers in our backpacks as well… In any case, Clément made it to the top with Abele and we got a good altitude training for the incoming San Francisco half-marathon. Plus an epic hike the next day around Cogne.
Archive for Chamonix
[A call from the BayesComp section of ISBA for the next Bayesian computation meeting! As suggested in an earlier post, the label MCMski is discontinued to allow for any location amenable to organise a 200 plus meeting in good and hopefully reasonably priced conditions.]
The Bayesian Computation Section of ISBA is soliciting proposals to host its flagship meeting: BayesComp 2018
The expectation is that the meeting will be held in January 2018, but the committee will consider proposals for other times through January 2019. This meeting is a continuation of the popular MCMSki on recent advances in the theory and application of Bayesian computational methods such as MCMC. The tradition was to hold MCMski meetings in ski resorts, but, as the name change suggests, we encourage applications from any venue that could support BC2018.
A three-day meeting is planned, perhaps with an additional day or two of satellite meetings and/or short courses. One page proposals should address feasibility of hosting the meeting including
1. Proposed dates.
2. Transportation for international participants (both the proximity of international airports and transportation to/from the venue).
3. The conference facilities.
4. The availability and cost of hotels, including low cost options.
5. The proposed local organizing committee and their collective experience organizing international meetings.
6. Expected or promised contributions from the host organization, host country, or industrial partners towards the cost of running the meetings.
Proposals should be submitted to Nicolas Chopin (Program Chair) no later than May 31, 2016. The Board of Bayesian Computing Section will evaluate the proposals, choose a venue, and appoint the Program Committee for BayesComp 2018.
As CHANCE book editor, I received the other day from Oxford University Press acts from an École de Physique des Houches on Statistical Physics, Optimisation, Inference, and Message-Passing Algorithms that took place there in September 30 – October 11, 2013. While it is mostly unrelated with Statistics, and since Igor Caron already reviewed the book a year and more ago, I skimmed through the few chapters connected to my interest, from Devavrat Shah’s chapter on graphical models and belief propagation, to Andrea Montanari‘s denoising and sparse regression, including LASSO, and only read in some detail Manfred Opper’s expectation propagation chapter. This paper made me realise (or re-realise as I had presumably forgotten an earlier explanation!) that expectation propagation can be seen as a sort of variational approximation that produces by a sequence of iterations the distribution within a certain parametric (exponential) family that is the closest to the distribution of interest. By writing the Kullback-Leibler divergence the opposite way from the usual variational approximation, the solution equates the expectation of the natural sufficient statistic under both models… Another interesting aspect of this chapter is the connection with estimating normalising constants. (I noticed a slight typo on p.269 in the final form of the Kullback approximation q() to p().
The BayesComp MCMski V [or MCMskv for short] has now its official website, once again maintained by Merrill Lietchy from Drexel University, Philadelphia, and registration is even open! The call for contributed sessions is now over, while the call for posters remains open until the very end. The novelty from the previous post is that there will be a “Breaking news” [in-between the Late news sessions at JSM and the crash poster talks at machine-learning conferences] session to highlight major advances among poster submissions. And that there will be an opening talk by Steve [the Bayesian] Scott on the 4th, about the frightening prospect of MCMC death!, followed by a round-table and a welcome reception, sponsored by the Swiss Supercomputing Centre. Hence the change in dates. Which still allows for arrivals in Zürich on the January 4th [be with you].
Following the highly successful [authorised opinion!, from objective sources] MCMski IV, in Chamonix last year, the BayesComp section of ISBA has decided in favour of a two-year period, which means the great item of news that next year we will meet again for MCMski V [or MCMskv for short], this time on the snowy slopes of the Swiss town of Lenzerheide, south of Zürich. The committees are headed by the indefatigable Antonietta Mira and Mark Girolami. The plenary speakers have already been contacted and Steve Scott (Google), Steve Fienberg (CMU), David Dunson (Duke), Krys Latuszynski (Warwick), and Tony Lelièvre (Mines, Paris), have agreed to talk. Similarly, the nine invited sessions have been selected and will include Hamiltonian Monte Carlo, Algorithms for Intractable Problems (ABC included!), Theory of (Ultra)High-Dimensional Bayesian Computation, Bayesian NonParametrics, Bayesian Econometrics, Quasi Monte Carlo, Statistics of Deep Learning, Uncertainty Quantification in Mathematical Models, and Biostatistics. There will be afternoon tutorials, including a practical session from the Stan team, tutorials for which call is open, poster sessions, a conference dinner at which we will be entertained by the unstoppable Imposteriors. The Richard Tweedie ski race is back as well, with a pair of Blossom skis for the winner!
This morning, in the train to Dauphine (train that was even more delayed than usual!), I read a recent arXival of Brendon Brewer and Courtney Donovan. Entitled Fast Bayesian inference for exoplanet discovery in radial velocity data, the paper suggests to associate Matthew Stephens’ (2000) birth-and-death MCMC approach with nested sampling to infer about the number N of exoplanets in an exoplanetary system. The paper is somewhat sparse in its description of the suggested approach, but states that the birth-date moves involves adding a planet with parameters simulated from the prior and removing a planet at random, both being accepted under a likelihood constraint associated with nested sampling. I actually wonder if this actually is the birth-date version of Peter Green’s (1995) RJMCMC rather than the continuous time birth-and-death process version of Matthew…
“The traditional approach to inferring N also contradicts fundamental ideas in Bayesian computation. Imagine we are trying to compute the posterior distribution for a parameter a in the presence of a nuisance parameter b. This is usually solved by exploring the joint posterior for a and b, and then only looking at the generated values of a. Nobody would suggest the wasteful alternative of using a discrete grid of possible a values and doing an entire Nested Sampling run for each, to get the marginal likelihood as a function of a.”
This criticism is receivable when there is a huge number of possible values of N, even though I see no fundamental contradiction with my ideas about Bayesian computation. However, it is more debatable when there are a few possible values for N, given that the exploration of the augmented space by a RJMCMC algorithm is often very inefficient, in particular when the proposed parameters are generated from the prior. The more when nested sampling is involved and simulations are run under the likelihood constraint! In the astronomy examples given in the paper, N never exceeds 15… Furthermore, by merging all N’s together, it is unclear how the evidences associated with the various values of N can be computed. At least, those are not reported in the paper.
The paper also omits to provide the likelihood function so I do not completely understand where “label switching” occurs therein. My first impression is that this is not a mixture model. However if the observed signal (from an exoplanetary system) is the sum of N signals corresponding to N planets, this makes more sense.
Last week, Michael Gutmann, Ritabrata Dutta, Samuel Kaski, and Jukka Corander posted on arXiv the last version of the paper they had presented at MCMSki 4. As indicated by its (above) title, it suggests implementing ABC based on classification tools. Thus making it somewhat connected to our recent random forest paper.
The starting idea in the paper is that datasets generated from distributions with different parameters should be easier to classify than datasets generated from distributions with the same parameters. And that classification accuracy naturally induces a distance between datasets and between the parameters behind those datasets. We had followed some of the same track when starting using random forests, before realising that for our model choice setting, proceeding the entire ABC way once the random forest procedure had been constructed was counter-productive. Random forests are just too deadly as efficient model choice machines to try to compete with them through an ABC postprocessing. Performances are just… Not. As. Good!
A side question: I have obviously never thought about that before but why is the naïve Bayes classification rule so called?! It never sounded very Bayesian to me to (a) use the true value of the parameter and (b) average the classification performances. Interestingly, the authors (i) show identical performances of other classification methods (Fig. 2) and (ii) an exception for MA time series: when we first experimented random forests, raw data from an MA(2) model was tested to select between MA(1) and MA(2) models, and the performances of the resulting random forest were quite poor.
Now, an opposition between our two approaches is that Michael and his coauthors also include point estimation within the range of classification-based ABC inference. As we stressed in our paper, we restrict the range to classification and model choice because we do not think those machine learning tools are stable and powerful enough to perform regression and posterior probability approximation. I also see a practical weakness in the estimation scheme proposed in this new paper. Namely that the Monte Carlo gets into the way of the consistency theorem. And possibly of the simulation method itself. Another remark is that, while the authors compare the fit produced by different classification methods, there should be a way to aggregate them towards higher efficiency. Returning once more to our random forest paper, we saw improved performances each time we included a reference method, from LDA to SVMs. It would be interesting to see a (summary) variable selection version of the proposed method. A final remark is that computing time and effort do not seem to get mentioned in the paper (unless Indian jetlag confuses me more than usual). I wonder how fast the computing effort grows with the sample size to reach parametric and quadratic convergence rates.