Archive for asynchronous algorithms

optimal importance sampling

Posted in Books, Statistics, Travel, University life with tags , , , , , , on January 13, 2016 by xi'an

somewhere near Zürich, Jan. 4, 2016An arXiv file that sat for quite a while in my to-read pile is Variance reduction in SGD by distributed importance sampling by Alain et al. I had to wait for the flight to Zürich and MCMskv to get a look at it. The part of the paper that is of primary interest to me is the generalisation of the optimal importance function result


to higher dimensions. Namely, what is the best importance function for approximating the expectation of h(X) when h is multidimensional? There does exist an optimal solution when the score function is the trace of the variance matrix. Where the solution is proportional to the target density times the norm of the target integrand


The application of the result to neural networks and stochastic gradients using minibatches of the training set somehow escapes me, even though the asynchronous aspects remind me of the recent asynchronous Gibbs sampler of Terenin, Draper, and Simpson.

While the optimality obtained in the paper is mathematically clear, I am a wee bit surprised at the approach: the lack of normalising constant in the optimum means using a reweighted approximation that drifts away from the optimal score. Furthermore, this optimum is sub-optimal when compared with the component wise optimum which produces a variance of zero (if we assume the normalising constant to be available). Obviously, using the component-wise optima requires to run as many simulations as there are components in the integrand, but since cost does not seem to be central to this study…

MCMskv #5 [future with a view]

Posted in Kids, Mountains, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on January 12, 2016 by xi'an

As I am flying back to Paris (with an afternoon committee meeting in München in-between), I am reminiscing on the superlative scientific quality of this MCMski meeting, on the novel directions in computational Bayesian statistics exhibited therein, and on the potential settings for the next meeting. If any.

First, as hopefully obvious from my previous entries, I found the scientific program very exciting, with almost uniformly terrific talks, and a coverage of the field of computational Bayesian statistics that is perfectly tuned to my own interest. In that sense, MCMski is my “top one” conference! Even without considering the idyllic location. While some of the talks were about papers I had already read (and commented here), others brought new vistas and ideas. If one theme is to emerge from this meeting it has to be the one of approximate and noisy algorithms, with a wide variety of solutions and approaches to overcome complexity issues. If anything, I wish the solutions would also incorporate the Boxian fact that the statistical models themselves are approximate. Overall, a fantastic program (says one member of the scientific committee).

Second, as with previous MCMski meetings, I again enjoyed the unique ambience of the meeting, which always feels more relaxed and friendly than other conferences of a similar size, maybe because of the après-ski atmosphere or of the special coziness provided by luxurious mountain hotels. This year hotel was particularly pleasant, with non-guests like myself able to partake of some of their facilities. A big thank you to Anto for arranging so meticulously all the details of such a large meeting!!! I am even more grateful when realising this is the third time Anto takes over the heavy load of organising MCMski. Grazie mille!

Since this is a [and even the!] BayesComp conference, the current section program chair and board must decide on the  structure and schedule of the next meeting. A few suggestions if I may: I would scrap entirely the name MCMski from the next conference as (a) it may sound like academic tourism for unaware bystanders (who only need to check the program of any of the MCMski conferences to stand reassured!) and (b) its topic go way beyond MCMC. Given the large attendance and equally large proportion of young researchers, I would also advise against hosting the conference in a ski resort for both cost and accessibility reasons [as we had already discussed after MCMskiv], in favour of a large enough town to offer a reasonable range of accommodations and of travel options. Like Chamonix, Innsbruck, Reykjavik, or any place with a major airport about one hour away… If nothing is available with skiing possibilities, so be it! While the outdoor inclinations of the early organisers induced us to pick locations where skiing over lunch break was a perk, any accessible location that allows for a concentration of researchers in a small area and for the ensuing day-long exchange is fine! Among the novelties in the program, the tutorials and the Breaking news! sessions were quite successful (says one member of the scientific committee). And should be continued in one format or another. Maybe a more programming thread could be added as well… And as we had mentioned earlier, to see a stronger involvement of the Young Bayesian section in the program would be great! (Even though the current meeting already had many young researcher  talks.)

MCMskv #1 [room with a view]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on January 6, 2016 by xi'an

That’s it!, MCMskv has now started! We hold our round-table Monday night, which ended with most of my interventions revolving about the importance of models. And of the fact that models are always approximate (and wrong), hence that uncertainty and uncertainty ascertainment is paramount. Even more with large datasets and roundtablehigh-dimensional models. Apologies to the audience if I sounded like running on a very short loop. (And maybe also for the round-table to keep them from their dinner!)  Still, I got some items for reflection out of this discussion, including the notion that big data is usually and inappropriately associated with an impression of completeness that is almost deterministic in a Laplacian sense. Namely that the available data for, say, all Facebook users, seems to allow us (or The Machine) to play Laplace’s Demon. And thus forgoes the need for uncertainty and uncertainty ascertainment. Which obviously clashes with the issues of poor data, inappropriate models, and time or space stationarity of the available information.

Two more computing-related notions that came out the discussion [for me] are asynchronicity (in the sense explored by Terenin et al. a few months ago) and subsampling, The later seems to mean many things, judging from the discussion from the panel and the audience. For me, it corresponded to the ability (or inability) to handle only part of the available data to simulate the posterior associated with this available data.

The first talk on Tuesday morning was the plenary talk by Michael Jordan about his incorporation of complexity constraints on the convergence of an MCMC variable selection algorithm. (I though I had commented this paper in the past on the ‘Og but apparently I did not!) This was quite interesting, with ultra-fast convergence of the sampler. The talk was alas made harder to follow because of a cameraman standing in front of most of the audience for the entire time, as in the above picture. (I also noticed the interesting randomness of the light panels, who all display different patterns of dots, maybe random enough to satisfy a randomness test!) Another if irrelevant annoying fact was that I discovered upon arrival that my airbnb rental was located 8 kilometres away from the conference location, in a completely different town! Thankfully, we had rented a car [for 5] which saved the day (and even more the night!).