Archive for Bayesian inference

machine learning methods are useful for ABC [or my first PCI Evol Biol!]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , on November 23, 2017 by xi'an

While I am still working on setting a PCI [peer community in] Comput Stats, having secure sponsorship of some societies (ASA, KSS, RSS, SFdS, and hopefully ISBA), my coauthors Jean-Michel Marin and Louis Raynal submitted our paper ABC random forests for Bayesian parameter inference to PCI Evol Biol. And after a few months of review, including a revision accounting for the reviewers’ requests, our paper stood the test and the recommendation by Michael Blum and Dennis Prangle got published there. Great news, and hopefully helpful for our submission within the coming days!

postdocs positions in Uppsala in computational stats for machine learning

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 22, 2017 by xi'an

Lawrence Murray sent me a call for two postdoc positions in computational statistics and machine learning. In Uppsala, Sweden. With deadline November 17. Definitely attractive for a fresh PhD! Here are some of the contemplated themes:

(1) Developing efficient Bayesian inference algorithms for large-scale latent variable models in data rich scenarios.

(2) Finding ways of systematically combining different inference techniques, such as variational inference, sequential Monte Carlo, and deep inference networks, resulting in new methodology that can reap the benefits of these different approaches.

(3) Developing efficient black-box inference algorithms specifically targeted at inference in probabilistic programs. This line of research may include implementation of the new methods in the probabilistic programming language Birch, currently under development at the department.

Astrostatistics school

Posted in Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on October 17, 2017 by xi'an

What a wonderful week at the Astrostat [Indian] summer school in Autrans! The setting was superb, on the high Vercors plateau overlooking both Grenoble [north] and Valence [west], with the colours of the Fall at their brightest on the foliage of the forests rising on both sides of the valley and a perfect green on the fields at the centre, with sun all along, sharp mornings and warm afternoons worthy of a late Indian summer, too many running trails [turning into X country ski trails in the Winter] to contemplate for a single week [even with three hours of running over two days], many climbing sites on the numerous chalk cliffs all around [but a single afternoon for that, more later in another post!]. And of course a group of participants eager to learn about Bayesian methodology and computational algorithms, from diverse [astronomy, cosmology and more] backgrounds, trainings and countries. I was surprised at the dedication of the participants travelling all the way from Chile, Péru, and Hong Kong for the sole purpose of attending the school. David van Dyk gave the first part of the school on Bayesian concepts and MCMC methods, Roberto Trotta the second part on Bayesian model choice and hierarchical models, and myself a third part on, surprise, surprise!, approximate Bayesian computation. Plus practicals on R.

As it happens Roberto had to cancel his participation and I turned for a session into Christian Roberto, presenting his slides in the most objective possible fashion!, as a significant part covered nested sampling and Savage-Dickey ratios, not exactly my favourites for estimating constants. David joked that he was considering postponing his flight to see me talk about these, but I hope I refrained from engaging into controversy and criticisms… If anything because this was not of interest for the participants. Indeed when I started presenting ABC through what I thought was a pedestrian example, namely Rasmus Baath’s socks, I found that the main concern was not running an MCMC sampler or a substitute ABC algorithm but rather an healthy questioning of the construction of the informative prior in that artificial setting, which made me quite glad I had planned to cover this example rather than an advanced model [as, e.g., one of those covered in the packages abc, abctools, or abcrf]. Because it generated those questions about the prior [why a Negative Binomial? why these hyperparameters? &tc.] and showed how programming ABC turned into a difficult exercise even in this toy setting. And while I wanted to give my usual warning about ABC model choice and argue for random forests as a summary selection tool, I feel I should have focussed instead on another example, as this exercise brings out so clearly the conceptual difficulties with what is taught. Making me quite sorry I had to leave one day earlier. [As did missing an extra run!] Coming back by train through the sunny and grape-covered slopes of Burgundy hills was an extra reward [and no one in the train commented about the local cheese travelling in my bag!]

 

Nature snapshots [and snide shots]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 12, 2017 by xi'an

A very rich issue of Nature I received [late] just before leaving for Warwick with a series of reviews on quantum computing, presenting machine learning as the most like immediate application of this new type of computing. Also including irate letters and an embarassed correction of an editorial published the week before reflecting on the need (or lack thereof) to remove or augment statues of scientists whose methods were unethical, even when eventually producing long lasting advances. (Like the 19th Century gynecologist J. Marion Sims experimenting on female slaves.) And a review of a book on the fascinating topic of Chinese typewriters. And this picture above of a flooded playground that looks like a piece of abstract art thanks to the muddy background.

“Quantum mechanics is well known to produce atypical patterns in data. Classical machine learning methods such as deep neural networks frequently have the feature that they can both recognize statistical patterns in data and produce data that possess the same statistical patterns: they recognize the patterns that they produce. This observation suggests the following hope. If small quantum information processors can produce statistical patterns that are computationally difficult for a classical computer to produce, then perhaps they can also recognize patterns that are equally difficult to recognize classically.” Jacob Biamonte et al., Nature, 14 Sept 2017

One of the review papers on quantum computing is about quantum machine learning. Although like Jon Snow I know nothing about this, I find it rather dull as it spends most of its space on explaining existing methods like PCA and support vector machines. Rather than exploring potential paradigm shifts offered by the exotic nature of quantum computing. Like moving to Bayesian logic that mimics a whole posterior rather than produces estimates or model probabilities. And away from linear representations. (The paper mentions a O(√N) speedup for Bayesian inference in a table, but does not tell more, which may thus be only about MAP estimators for all I know.) I also disagree with the brave new World tone of the above quote or misunderstand its meaning. Since atypical and statistical cannot but clash, “universal deep quantum learners may recognize and classify patterns that classical computers cannot” does not have a proper meaning. The paper contains a vignette about quantum Boltzman machines that finds a minimum entropy approximation to a four state distribution, with comments that seem to indicate an ability to simulate from this system.

ACDC versus ABC

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , on June 12, 2017 by xi'an

At the Bayes, Fiducial and Frequentist workshop last month, I discussed with the authors of this newly arXived paper, Approximate confidence distribution computing, Suzanne Thornton and Min-ge Xie. Which they abbreviate as ACC and not as ACDC. While I have discussed the notion of confidence distribution in some earlier posts, this paper aims at producing proper frequentist coverage within a likelihood-free setting. Given the proximity with our recent paper on the asymptotics of ABC, as well as with Li and Fearnhead (2016) parallel endeavour, it is difficult (for me) to spot the actual distinction between ACC and ABC given that we also achieve (asymptotically) proper coverage when the limiting ABC distribution is Gaussian, which is the case for a tolerance decreasing quickly enough to zero (in the sample size).

“Inference from the ABC posterior will always be difficult to justify within a Bayesian framework.”

Indeed the ACC setting is eerily similar to ABC apart from the potential of the generating distribution to be data dependent. (Which is fine when considering that the confidence distributions have no Bayesian motivation but are a tool to ensure proper frequentist coverage.) That it is “able to offer theoretical support for ABC” (p.5) is unclear to me, given both this data dependence and the constraints it imposes on the [sampling and algorithmic] setting. Similarly, I do not understand how the authors “are not committing the error of doubly using the data” (p.5) and why they should be concerned about it, standing outside the Bayesian framework. If the prior involves the data as in the Cauchy location example, it literally uses the data [once], followed by an ABC comparison between simulated and actual data, that uses the data [a second time].

“Rather than engaging in a pursuit to define a moving target such as [a range of posterior distributions], ACC maintains a consistently clear frequentist interpretation (…) and thereby offers a consistently cohesive interpretation of likelihood-free methods.”

The frequentist coverage guarantee comes from a bootstrap-like assumption that [with tolerance equal to zero] the distribution of the ABC/ACC/ACDC random parameter around an estimate of the parameter given the summary statistic is identical to the [frequentist] distribution of this estimate around the true parameter [given the true parameter, although this conditioning makes no sense outside a Bayesian framework]. (There must be a typo in the paper when the authors define [p.10] the estimator as minimising the derivative of the density of the summary statistic, while still calling it an MLE.) That this bootstrap-like assumption holds is established (in Theorem 1) under a CLT on this MLE and assumptions on the data-dependent proposal that connect it to the density of the summary statistic. Connection that seem to imply a data-dependence as well as a certain knowledge about this density. What I find most surprising in this derivation is the total absence of conditions or even discussion on the tolerance level which, as we have shown, is paramount to the validation or invalidation of ABC inference. It sounds like the authors of Approximate confidence distribution computing are setting ε equal to zero for those theoretical derivations. While in practice they apply rules [for choosing ε] they do not voice out, but which result in very different acceptance rates for the ACC version they oppose to an ABC version. (In all illustrations, it seems that ε=0.1, which does not make much sense.) All in all, I am thus rather skeptical about the practical implications of the paper in that it seems to achieve confidence guarantees by first assuming proper if implicit choices of summary statistics and parameter generating distribution.

efficient acquisition rules for ABC

Posted in pictures, Statistics, University life with tags , , , , , , , , on June 5, 2017 by xi'an

A few weeks ago, Marko Järvenpää, Michael Gutmann, Aki Vehtari and Pekka Marttinen arXived a paper on sampling design for ABC that reminded me of presentations Michael gave at NIPS 2014 and in Banff last February. The main notion is that, when the simulation from the model is hugely expensive, random sampling does not make sense.

“While probabilistic modelling has been used to accelerate ABC inference, and strategies have been proposed for selecting which parameter to simulate next, little work has focused on trying to quantify the amount of uncertainty in the estimator of the ABC posterior density itself.”

The above question  is obviously interesting, if already considered in the literature for it seems to focus on the Monte Carlo error in ABC, addressed for instance in Fearnhead and Prangle (2012), Li and Fearnhead (2016) and our paper with David Frazier, Gael Martin, and Judith Rousseau. With corresponding conditions on the tolerance and the number of simulations to relegate Monte Carlo error to a secondary level. And the additional remark that the (error free) ABC distribution itself is not the ultimate quantity of interest. Or the equivalent (?) one that ABC is actually an exact Bayesian method on a completed space.

The paper initially confused me for a section on the very general formulation of ABC posterior approximation and error in this approximation. And simulation design for minimising this error. It confused me as it sounded too vague but only for a while as the remaining sections appear to be independent. The operational concept of the paper is to assume that the discrepancy between observed and simulated data, when perceived as a random function of the parameter θ, is a Gaussian process [over the parameter space]. This modelling allows for a prediction of the discrepancy at a new value of θ, which can be chosen as maximising the variance of the likelihood approximation. Or more precisely of the acceptance probability. While the authors report improved estimation of the exact posterior, I find no intuition as to why this should be the case when focussing on the discrepancy, especially because small discrepancies are associated with parameters approximately generated from the posterior.

La déraisonnable efficacité des mathématiques

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on May 11, 2017 by xi'an

Although it went completely out of my mind, thanks to a rather heavy travel schedule, I gave last week a short interview about the notion of mathematical models, which got broadcast this week on France Culture, one of the French public radio channels. Within the daily La Méthode Scientifique show, which is a one-hour emission on scientific issues, always a [rare] pleasure to listen to. (Including the day they invited Claire Voisin.) The theme of the show that day was about the unreasonable effectiveness of mathematics, with the [classical] questioning of whether it is an efficient tool towards solving scientific (and inference?) problems because the mathematical objects pre-existed their use or we are (pre-)conditioned to use mathematics to solve problems. I somewhat sounded like a dog in a game of skittles, but it was interesting to listen to the philosopher discussing my relativistic perspective [provided you understand French!]. And I appreciated very much the way Céline Loozen the journalist who interviewed me sorted the chaff from the wheat in the original interview to make me sound mostly coherent! (A coincidence: Jean-Michel Marin got interviewed this morning on France Inter, the major public radio, about the Grothendieck papers.)