Archive for MCMskv

MCMskv #4 [house with a vision]

Posted in Statistics with tags , , , , , , , , , , , , on January 9, 2016 by xi'an

OLYMPUS DIGITAL CAMERALast day at MCMskv! Not yet exhausted by this exciting conference, but this was the toughest day with one more session and a tutorial by Art Own on quasi Monte-Carlo. (Not even mentioning the night activities that I skipped. Or the ski break that I did not even consider.) Krys Latunszynski started with a plenary on exact methods for discretised diffusions, with a foray in Bernoulli factory problems. Then a neat session on adaptive MCMC methods that contained a talk by Chris Sherlock on delayed acceptance, where the approximation to the target was built by knn trees. (The adaptation was through the construction of the tree by including additional evaluations of the target density. Another paper sitting in my to-read list for too a long while: the exploitation of the observed values of π towards improving an MCMC sampler has always be “obvious” to me even though I could not see any practical way of doing so. )

It was wonderful that Art Owen accepted to deliver a tutorial at MCMskv on quasi-random Monte Carlo. Great tutorial, with a neat coverage of the issues most related to Monte Carlo integration. Since quasi-random sequences have trouble with accept/reject methods, a not-even-half-baked idea that came to me during Art’s tutorial was that the increased computing power granted by qMC could lead to a generic integration of the Metropolis-Hastings step in a Rao-Blackwellised manner. Art mentioned he was hoping that in a near future one could switch between pseudo- and quasi-random in an almost automated manner when running standard platforms like R. This would indeed be great, especially since quasi-random sequences seem to be available at the same cost as their pseudo-random counterpart. During the following qMC session, Art discussed the construction of optimal sequences on sets other than hypercubes (with the surprising feature that projecting optimal sequences from the hypercube does not work). Mathieu Gerber presented the quasi-random simulated annealing algorithm he developed with Luke Bornn that I briefly discussed a while ago. Or thought I did as I cannot trace a post on that paper! While the fact that annealing also works with quasi-random sequences is not astounding, the gain over random sequences shown on two examples is clear. The session also had a talk by Lester Mckey who relies Stein’s discrepancy to measure the value of an approximation to the true target. This was quite novel, with a surprising connection to Chris Oates’ talk and the use of score-based control variates, if used in a dual approach.

Another great session was the noisy MCMC one organised by Paul Jenkins (Warwick), with again a coherent presentation of views on the quality or lack thereof of noisy (or inexact) versions, with an update from Richard Everitt on inexact MCMC, Felipe Medina Aguayo (Warwick) on sufficient conditions for noisy versions to converge (and counterexamples), Jere Koskela (Warwick) on a pseudo-likelihood approach to the highly complex Kingman’s coalescent model in population genetics (of ABC fame!), and Rémi Bardenet on the tall data approximations techniques discussed in a recent post. Having seen or read most of those results previously did not diminish the appeal of the session.

MCMskv #3 [town with a view]

Posted in Statistics with tags , , , , , , , , , , , , , on January 8, 2016 by xi'an

Third day at MCMskv, where I took advantage of the gap left by the elimination of the Tweedie Race [second time in a row!] to complete and submit our mixture paper. Despite the nice weather. The rest of the day was quite busy with David Dunson giving a plenary talk on various approaches to approximate MCMC solutions, with a broad overview of the potential methods and of the need for better solutions. (On a personal basis, great line from David: “five minutes or four minutes?”. It almost beat David’s question on the previous day, about the weight of a finch that sounded suspiciously close to the question about the air-speed velocity of an unladen swallow. I was quite surprised the speaker did not reply with the Arthurian “An African or an European finch?”) In particular, I appreciated the notion that some problems were calling for a reduction in the number of parameters, rather than the number of observations. At which point I wrote down “multiscale approximations required” in my black pad,  a requirement David made a few minutes later. (The talk conditions were also much better than during Michael’s talk, in that the man standing between the screen and myself was David rather than the cameraman! Joke apart, it did not really prevent me from reading them, except for most of the jokes in small prints!)

The first session of the morning involved a talk by Marc Suchard, who used continued fractions to find a closed form likelihood for the SIR epidemiology model (I love continued fractions!), and a talk by Donatello Telesca who studied non-local priors to build a regression tree. While I am somewhat skeptical about non-local testing priors, I found this approach to the construction of a tree quite interesting! In the afternoon, I obviously went to the intractable likelihood session, with talks by Chris Oates on a control variate method for doubly intractable models, Brenda Vo on mixing sequential ABC with Bayesian bootstrap, and Gael Martin on our consistency paper. I was not aware of the Bayesian bootstrap proposal and need to read through the paper, as I fail to see the appeal of the bootstrap part! I later attended a session on exact Monte Carlo methods that was pleasantly homogeneous. With talks by Paul Jenkins (Warwick) on the exact simulation of the Wright-Fisher diffusion, Anthony Lee (Warwick) on designing perfect samplers for chains with atoms, Chang-han Rhee and Sebastian Vollmer on extensions of the Glynn-Rhee debiasing technique I previously discussed on the blog. (Once again, I regretted having to make a choice between the parallel sessions!)

The poster session (after a quick home-made pasta dish with an exceptional Valpolicella!) was almost universally great and with just the right number of posters to go around all of them in the allotted time. With in particular the Breaking News! posters of Giacomo Zanella (Warwick), Beka Steorts and Alexander Terenin. A high quality session that made me regret not touring the previous one due to my own poster presentation.

MCMskv #2 [ridge with a view]

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , on January 7, 2016 by xi'an

Tuesday at MCMSkv was a rather tense day for me, from having to plan the whole day “away from home” [8km away] to the mundane worry of renting ski equipment and getting to the ski runs over the noon break, to giving a poster over our new mixture paper with Kaniav Kamary and Kate Lee, as Kaniav could not get a visa in time. It actually worked out quite nicely, with almost Swiss efficiency. After Michael Jordan’s talk, I attended a Bayesian molecular biology session with an impressive talk by Jukka Corander on evolutionary genomics with novel ABC aspects. And then a Hamiltonian Monte Carlo session with two deep talks by Sam Livingstone and Elena Akhmatskaya on the convergence of HMC, followed by an amazing entry into Bayesian cosmology by Jens Jasche (with a slight drawback that MCMC simulations took about a calendar year, handling over 10⁷ parameters). Finishing the day with more “classical” MCMC convergence results and techniques, with talks about forgetting time, stopping time (an undervalued alternative to convergence controls), and CLTs. Including a multivariate ESS by James Flegal. (This choice of sessions was uniformly frustrating as I was also equally interested in “the other” session. The drawback of running parallel sessions, obviously.)

The poster session was busy and animated, but I alas could not get an idea of the other posters as I was presenting mine. This was quite exciting as I discussed a new parametrisation for location-scale mixture models that allows for a rather straightforward “non-informative” or reference prior. (The paper with Kaniav Kamary and Kate Lee should be arXived overnight!) The recently deposited CRAN package Ultimixt by Kaniav and Kate contains Metropolis-Hastings functions related to this new approach. The result is quite exciting, especially because I have been looking for it for decades and I will discuss it pretty soon in another post, and I had great exchanges with the conference participants, which led me to consider the reparametrisation in a larger scale and to simplify the presentation of the approach, turning the global mean and variance as hyperparameters.

The day was also most auspicious for a ski break as it was very mild and sunny, while the snow conditions were (somewhat) better than the ones we had in the French Alps two weeks ago. (Too bad that the Tweedie ski race had to be cancelled for lack of snow on the reserved run! The Blossom ski reward will have again to be randomly allocated!) Just not exciting enough to consider another afternoon out, given the tension in getting there and back. (And especially when considering that it took me the entire break time to arXive our mixture paper…)

MCMskv #1 [room with a view]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on January 6, 2016 by xi'an

That’s it!, MCMskv has now started! We hold our round-table Monday night, which ended with most of my interventions revolving about the importance of models. And of the fact that models are always approximate (and wrong), hence that uncertainty and uncertainty ascertainment is paramount. Even more with large datasets and roundtablehigh-dimensional models. Apologies to the audience if I sounded like running on a very short loop. (And maybe also for the round-table to keep them from their dinner!)  Still, I got some items for reflection out of this discussion, including the notion that big data is usually and inappropriately associated with an impression of completeness that is almost deterministic in a Laplacian sense. Namely that the available data for, say, all Facebook users, seems to allow us (or The Machine) to play Laplace’s Demon. And thus forgoes the need for uncertainty and uncertainty ascertainment. Which obviously clashes with the issues of poor data, inappropriate models, and time or space stationarity of the available information.

Two more computing-related notions that came out the discussion [for me] are asynchronicity (in the sense explored by Terenin et al. a few months ago) and subsampling, The later seems to mean many things, judging from the discussion from the panel and the audience. For me, it corresponded to the ability (or inability) to handle only part of the available data to simulate the posterior associated with this available data.

The first talk on Tuesday morning was the plenary talk by Michael Jordan about his incorporation of complexity constraints on the convergence of an MCMC variable selection algorithm. (I though I had commented this paper in the past on the ‘Og but apparently I did not!) This was quite interesting, with ultra-fast convergence of the sampler. The talk was alas made harder to follow because of a cameraman standing in front of most of the audience for the entire time, as in the above picture. (I also noticed the interesting randomness of the light panels, who all display different patterns of dots, maybe random enough to satisfy a randomness test!) Another if irrelevant annoying fact was that I discovered upon arrival that my airbnb rental was located 8 kilometres away from the conference location, in a completely different town! Thankfully, we had rented a car [for 5] which saved the day (and even more the night!).

(more) years of data science

Posted in Mountains, Statistics, University life with tags , on January 4, 2016 by xi'an

Here is David Draper’s discussion on David Donoho’s 50 Years of Data Science:

This was a good choice for a jumping-off point for a round-table discussion on the Future of Data Science; David Donoho hits a number of cogent nails on the head, and also leaves room for other perspectives (if all the round-table participants had written their own versions of Donoho’s paper, the 9 different experimental paths the participants have taken would have resulted in 9 quite different versions). The same issue applies to Donoho’s paper: he’s superb on things in his experimental path about which he’s thought carefully, but (like all of us) he has experientially-driven blind spots.

I write from the point of view of an academic statistician — working in all three of theory, methodology, applications — with a total of about 3 years of industrial experience in Data Science at research labs in eBay and Amazon; I’ve also talked at length with statisticians at Google and Facebook, and I’ve given Data Science seminars at all four companies. To date I’ve worked on the following Data Science problems:
• optimal design and analysis of A/B tests (randomized controlled experiments) with 10–100 million subjects in each of the treatment and control groups;
• optimal design and analysis of observational studies (because randomized experiments are not always possible), again with 10–100 million subjects in each arm of the study;
• one-step-ahead forecasts of 1–100 million (related) non-stationary time series, for the purpose of anomaly detection; and
• multi-step-ahead forecasts of 30 million (related) non-stationary time series, to support optimal inventory control decisions.

My blind spots (at least the ones I know about) include (a) less familiarity with machine learning and econometrics than I would like and (b) no personal experience with what below is called 2016-style High-Performance Computing (although two of my Ph.D. students are currently dragging me into the 21st century). My comments are aimed primarily at statisticians who want to become Data Scientists. Continue reading

years (and years) of data science

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on January 4, 2016 by xi'an

In preparation for the round table at the start of the MCMSkv conference, this afternoon, Anto sent us a paper written by David Donoho for the Tukey Centennial workshop, held in Princeton last September. Entitled 50 years of Data Science. And which attracted a whole round of comments, judging from the Google search results. So much that I decided not to read any of them before parsing through the paper. But almost certainly reproducing here with my two cents some of the previous comments.

“John Tukey’s definition of `Big Data’ was `anything that won’t fit on one device’.”

The complaint that data science is essentially statistics that does not dare to spell out statistics as if it were a ten letter word (p.5) is not new, if appropriate. In this paper, David Donoho evacuates the memes that supposedly separate data science from statistics, like “big data” (although I doubt non-statisticians would accept the quick rejection that easily, wondering at the ability of statisticians to develop big models), skills like parallel programming (which ineluctably leads to more rudimentary algorithms and inferential techniques), jobs requiring such a vast array of skills and experience that no graduate student sounds properly trained for it…

“A call to action, from a statistician who fells `the train is leaving the station’.” (p.12)

One point of the paper is to see 1962 John Tukey’s “The Future of Data Analysis” as prophetical of the “Big Data” and “Data Science” crises. Which makes a lot of sense when considering the four driving forces advanced by Tukey (p.11):

  1. formal statistics
  2. advanced computing and graphical devices
  3. the ability to face ever-growing data flows
  4. its adoption by an ever-wider range of fields

“Science about data science will grow dramatically in significance.”

David Donoho then moves on to incorporate   Leo Breiman’s 2001 Two Cultures paper. Which separates machine learning and prediction from statistics and inference, leading to the “big chasm”! And he sees the combination of prediction with “common task framework” as the “secret sauce” of machine learning, because of the possibility of objective comparison of methods on a testing dataset. Which does not seem to me as the explanation for the current (real or perceived) disaffection for statistics and correlated attraction for more computer-related solutions. A code that wins a Kaggle challenge clearly has some efficient characteristics, but this tells me nothing of the abilities of the methodology behind that code. If any. Self-learning how to play chess within 72 hours is great, but is the principle behind able to handle go at the same level?  Plus, I remain worried about the (screaming) absence of model (or models) in predictive approaches. Or at least skeptical. For the same reason it does not help in producing a generic approach to problems. Nor an approximation to the underlying mechanism. I thus see nothing but a black box in many “predictive models”, which tells me nothing about the uncertainty, imprecision or reproducibility of such tools. “Tool evaluation” cannot be reduced to a final score on a testing benchmark. The paper concludes with the prediction that the validation of scientific methodology will solely be empirical (p.37). This leaves little ground if any for probability and uncertainty quantification, as reflected their absence in the paper.

MCMskv, Lenzerheide, 4-7 Jan., 2016 [breaking news #6]

Posted in Kids, Mountains, pictures, Travel, University life with tags , , , , , , , , , , on December 2, 2015 by xi'an

moonriseAs indicated in an earlier MCMskv news, the scientific committee kept a session open for Breaking news! proposals, in conjunction with poster submissions. We received 21 proposals and managed to squeeze 12 fifteen minute presentations in an already tight program. (I advise all participants to take a relaxing New Year break and to load in vitamins and such in preparation for a 24/7 or rather 24/3 relentless and X’citing conference!) Here are the selected presentations, with (some links to my posts on the related papers and) abstracts available on the conference website. Note to all participants that there are still a few days left for submitting posters!

Luke Bornn

Jon Cockayne

Gersende Fort

Michael Gutmann

James Johndrow

Jean-Michel Marin

Murray Pollock

Maxim Rabinovich

Rebecca Steorts

Alexander Terenin

Yazhen Wang

Giacomo Zanella