Archive for University of Oxford

Au’Bayes 17

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on December 14, 2017 by xi'an

Some notes scribbled during the O’Bayes 17 conference in Austin, not reflecting on the highly diverse range of talks. And many new faces and topics, meaning O’Bayes is alive and evolving. With all possible objectivity, a fantastic conference! (Not even mentioning the bars where Peter Müller hosted the poster sessions, a feat I would have loved to see duplicated for the posters of ISBA 2018… Or the Ethiopian restaurant just around the corner with the right amount of fierce spices!)

The wiki on objective, reference, vague, neutral [or whichever label one favours] priors that was suggested at the previous O’Bayes meeting in Valencià, was introduced as Wikiprevia by Gonzalo Garcia-Donato. It aims at classifying recommended priors in most of the classical models, along with discussion panels, and it should soon get an official launch, when contributors will be welcome to include articles in a wiki principle. I wish the best to this venture which, I hope, will induce O’Bayesians to contribute actively.

In a brilliant talk that quickly reverted my jetlag doziness, Peter Grünwald returned to the topic he presented last year in Sardinia, namely safe Bayes or powered-down likelihoods to handle some degree of misspecification, with a further twist of introducing an impossible value `o’ that captures missing mass (to be called Peter’s demon?!), which absolute necessity I did not perceive. Food for thoughts, definitely. (But I feel that the only safe Bayes is the dead Bayes, as protecting against all kinds of mispecifications means no action is possible.)

I also appreciated Cristiano Villa’s approach to constructing prior weights in model comparison from a principled and decision-theoretic perspective even though I felt that the notion of ranking parameter importance required too much input to be practically feasible. (Unless I missed that point.)

Laura Ventura gave her talk on using for ABC various scores or estimating equations as summary statistics, rather than the corresponding M-estimators, which offers the appealing feature of reducing computation while being asymptotically equivalent. (A feature we also exploited for the regular score function in our ABC paper with Gael, David, Brendan, and Wonapree.) She mentioned the Hyvärinen score [of which I first heard in Padova!] as a way to bypass issues related to doubly intractable likelihoods. Which is a most interesting proposal that bypasses (ABC) simulations from such complex targets by exploiting a pseudo-posterior.

Veronika Rockova presented a recent work on concentration rates for regression tree methods that produce a rigorous analysis of these methods. Showing that the spike & slab priors plus BART [equals spike & tree] achieve sparsity and optimal concentration. In an oracle sense. With a side entry on assembling partition trees towards creating a new form of BART. Which made me wonder whether or not this was also applicable to random forests. Although they are not exactly Bayes. Demanding work in terms of the theory behind but with impressive consequences!

Just before I left O’Bayes 17 for Houston airport, Nick Polson, along with Peter McCullach, proposed an intriguing notion of sparse Bayes factors, which corresponds to the limit of a Bayes factor when the prior probability υ of the null goes to zero. When the limiting prior is replaced with an exceedance measure that can be normalised into a distribution, but does it make the limit a special prior? Linking  υ with the prior under the null is not an issue (this was the basis of my 1992 Lindley paradox paper) but the sequence of priors indexed by υ need be chosen. And reading from the paper at Houston airport, I could not spot a construction principle that would lead to a reference prior of sorts. One thing that Nick mentioned during his talk was that we observed directly realisations of the data marginal, but this is generally not the case as the observations are associated with a given value of the parameter, not one for each observation.The next edition of the O’Bayes conference will be in… Warwick on June 29-July 2, as I volunteered to organise this edition (16 years after O’Bayes 03 in Aussois!) just after the BNP meeting in Oxford on June 23-28, hopefully creating the environment for fruitful interactions between both communities! (And jumping from Au’Bayes to Wa’Bayes.)

oxwasp@amazon.de

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on April 12, 2017 by xi'an

The reason for my short visit to Berlin last week was an OxWaSP (Oxford and Warwick Statistics Program) workshop hosted by Amazon Berlin with talks between statistics and machine learning, plus posters from our second year students. While the workshop was quite intense, I enjoyed very much the atmosphere and the variety of talks there. (Just sorry that I left too early to enjoy the social programme at a local brewery, Brauhaus Lemke, and the natural history museum. But still managed nice runs east and west!) One thing I found most interesting (if obvious in retrospect) was the different focus of academic and production talks, where the later do not aim at a full generality or at a guaranteed improvement over the existing, provided the new methodology provides a gain in efficiency over the existing.

This connected nicely with me reading several Nature articles on quantum computing during that trip,  where researchers from Google predict commercial products appearing in the coming five years, even though the technology is far from perfect and the outcome qubit error prone. Among the examples they provided, quantum simulation (not meaning what I consider to be simulation!), quantum optimisation (as a way to overcome multimodality), and quantum sampling (targeting given probability distributions). I find the inclusion of the latest puzzling in that simulation (in that sense) shows very little tolerance for errors, especially systematic bias. It may be that specific quantum architectures can be designed for specific probability distributions, just like some are already conceived for optimisation. (It may even be the case that quantum solutions are (just next to) available for intractable constants as in Ising or Potts models!)

automated ABC summary combination

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on March 16, 2017 by xi'an

Jonathan Harrison and Ruth Baker (Oxford University) arXived this morning a paper on the optimal combination of summaries for ABC in the sense of deriving the proper weights in an Euclidean distance involving all the available summaries. The idea is to find the weights that lead to the maximal distance between prior and posterior, in a way reminiscent of Bernardo’s (1979) maximal information principle. Plus a sparsity penalty à la Lasso. The associated algorithm is sequential in that the weights are updated at each iteration. The paper does not get into theoretical justifications but considers instead several examples with limited numbers of both parameters and summary statistics. Which may highlight the limitations of the approach in that handling (and eliminating) a large number of parameters may prove impossible this way, when compared with optimisation methods like random forests. Or summary-free distances between empirical distributions like the Wasserstein distance.

Oxford snapshot [jatp]

Posted in Books, Kids, pictures, Travel, University life with tags , , , , , , on February 9, 2017 by xi'an

relativity is the keyword

Posted in Books, Statistics, University life with tags , , , , , , , on February 1, 2017 by xi'an

St John's College, Oxford, Feb. 23, 2012As I was teaching my introduction to Bayesian Statistics this morning, ending up with the chapter on tests of hypotheses, I found reflecting [out loud] on the relative nature of posterior quantities. Just like when I introduced the role of priors in Bayesian analysis the day before, I stressed the relativity of quantities coming out of the BBB [Big Bayesian Black Box], namely that whatever happens as a Bayesian procedure is to be understood, scaled, and relativised against the prior equivalent, i.e., that the reference measure or gauge is the prior. This is sort of obvious, clearly, but bringing the argument forward from the start avoids all sorts of misunderstanding and disagreement, in that it excludes the claims of absolute and certainty that may come with the production of a posterior distribution. It also removes the endless debate about the determination of the prior, by making each prior a reference on its own. With an additional possibility of calibration by simulation under the assumed model. Or an alternative. Again nothing new there, but I got rather excited by this presentation choice, as it seems to clarify the path to Bayesian modelling and avoid misapprehensions.

Further, the curious case of the Bayes factor (or of the posterior probability) could possibly be resolved most satisfactorily in this framework, as the [dreaded] dependence on the model prior probabilities then becomes a matter of relativity! Those posterior probabilities depend directly and almost linearly on the prior probabilities, but they should not be interpreted in an absolute sense as the ultimate and unique probability of the hypothesis (which anyway does not mean anything in terms of the observed experiment). In other words, this posterior probability does not need to be scaled against a U(0,1) distribution. Or against the p-value if anyone wishes to do so. By the end of the lecture, I was even wondering [not so loudly] whether or not this perspective was allowing for a resolution of the Lindley-Jeffreys paradox, as the resulting number could be set relative to the choice of the [arbitrary] normalising constant. Continue reading

back in Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on January 30, 2017 by xi'an

As in the previous years, I am back in Oxford (England) for my short Bayesian Statistics course in the joint Oxford-Warwick PhD programme, OxWaSP.  For some unclear reason, presumably related to the Internet connection from Oxford, I have not been able to upload my slides to Slideshare, so here the [99.9% identical] older version:

anytime algorithm

Posted in Books, Statistics with tags , , , , , , , , , on January 11, 2017 by xi'an

Lawrence Murray, Sumeet Singh, Pierre Jacob, and Anthony Lee (Warwick) recently arXived a paper on Anytime Monte Carlo. (The earlier post on this topic is no coincidence, as Lawrence had told me about this problem when he visited Paris last Spring. Including a forced extension when his passport got stolen.) The difficulty with anytime algorithms for MCMC is the lack of exchangeability of the MCMC sequence (except for formal settings where regeneration can be used).

When accounting for duration of computation between steps of an MCMC generation, the Markov chain turns into a Markov jump process, whose stationary distribution α is biased by the average delivery time. Unless it is constant. The authors manage this difficulty by interlocking the original chain with a secondary chain so that even- and odd-index chains are independent. The secondary chain is then discarded. This provides a way to run an anytime MCMC. The principle can be extended to K+1 chains, run one after the other, since only one of those chains need be discarded. It also applies to SMC and SMC². The appeal of anytime simulation in this particle setting is that resampling is no longer a bottleneck. Hence easily distributed among processors. One aspect I do not fully understand is how the computing budget is handled, since allocating the same real time to each iteration of SMC seems to envision each target in the sequence as requiring the same amount of time. (An interesting side remark made in this paper is the lack of exchangeability resulting from elaborate resampling mechanisms, lack I had not thought of before.)