## future of computational statistics

I am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision… A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics…

October 30, 2014 at 6:39 pm

I came across your blog post through Yee Whye. This is a topic I am interested in, so I though I would offer a few suggestions, since they are not mentioned, but they seem relevant.

One is a pointer to stochastic gradient descent estimators that leverage stochastic approximations a la Robins & Monro (1951), and related research on proximal methods. My student Panos Toulis and I have a recent paper in this area: http://arxiv.org/abs/1408.2923. Our angle is to introduce and explore “implicit” stochastic gradient descent estimators. What people may find useful is the discussion of the broader literature in Section 2.5. A notable paper in this area is here: http://arxiv.org/abs/1306.2119, by Bach and Moulines.

The other pointer is to a paper by Michael Jordan that appeared in Bernoulli (http://projecteuclid.org/euclid.bj/1377612856) where he summarizes some examples of strategies to scale inference methods to massive datasets. You may also find of interest slides of a talk of his at the 2014 IMS New Researchers Conference (http://www.stat.harvard.edu/NRC2014/MichaelJordan.pdf), where he explores conceptual and mathematical challenges that arise in big data settings. He concludes that facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations, thus reshaping both disciplines.

I look forward to reading your survey paper!

September 30, 2014 at 1:02 am

[…] article was first published on Xi’an’s Og » R, and kindly contributed […]

September 29, 2014 at 4:42 pm

The term “Computational Statistics” makes me remember what one of my brilliant colleagues from statistical genetics used to say about people working in that area. He claimed that there were two categories (A & B)

A) First: compute and second: think

B) First: think and second: compute

It was supposed to be a joke but most of those supporting B had been in a state of sin by practising A without saying it. As usual, the world, even in science, is not black and white! There are many examples of statistical techniques which had been discovered by means of trial and error before being fully justified theoretically eg. BLUP, EM, ABC,…

In a time of “Computing Uber Alles” (and kindred software) what about advocating a feedback loop between “computing” and “thinking” as a realistic attitude towards “Computational Statistics”.

Regarding alternative languages, I had been impressed by the increasing use of graphical models over the last decade both at the applied and theoretical fronts. Oddly enough, I had been taught quantitative genetics in the 70’s using “path coefficients’ of Sewall Wright (highly criticized to that respect) so that I was not so much disoriented by such techniques when they came back 30 yrs later.

September 29, 2014 at 5:02 am

Other than the lingering suspicion that “computational statistics” is an increasingly redundant term, I don’t have much to add. (You can’t do adequate statistics without computation, and you cannot design decent computational algorithms that are agnostic to the application at hand)

Except maybe to reenforce Chris Holmes’ larger point (which I may be pulling from the thin blue sky), that computing (or “computing”) the entire posterior is not a decent substitute for knowing what you actually want from the data. (This is especially true in high dimensions!)

Or that asymptotic arguments are meaningless white noise unless those regimes are demonstrated to be reachable in practice. (But I’m a cynic)

Or possibly to reiterate the fundamental law of computational maths: whatever you think you’re computing, you’re not.

September 29, 2014 at 3:50 pm

title should have been future of [computational] statistics then…

September 29, 2014 at 6:12 pm

[The] [future] of [computational] [statistics] :p (I just spent 24 hours cleaning before cleaners came, waiting for cleaners to finish and then cleaning after they were done and I am now homeless, so I may not be the best judge of the future, computation, statistics, or definite articles. And, to be honest, my grip on prepositions is a touch tenuous.)